Functional clustering of proteins to map out the protein shape space
ORAL
Abstract
Understanding the relationship between sequence, structure, and function in proteins is a persistent challenge with implications throughout science and medicine. Within this issue, predicting function for structures that are not well known remains a difficult task, and a primary obstacle is in identifying critical sites that contribute to a protein’s function. Here we present a method for protein function classification and prediction of functional sites. Given a query structure, our scheme identifies high classifying residues corresponding to functional sites using feature-space embeddings extracted from a pre-trained network for protein structures. This network respects essential physical symmetries, yielding maximally informative and rotationally invariant embeddings. Candidate proteins are then clustered based on their similarity to these classifying residues. Specifically, we study protein kinase families and exploit established CATH superfamilies to validate the output clusters. We demonstrate that our approach accurately clusters structures from unseen families for a large group of protein kinases. This work can be used to shed light on the structure of the protein universe, with potential for grouping formerly uncharacterized structures and aiding in efforts to describe the landscape of protein shape-space.
* This work has been supported by the National Institutes of Health MIRA award (R35 GM142795), the CAREER award from the National Science Foundation (grant No: 2045054), and the National Science Foundation NRT Accelerating Quantum-Enabled Technologies Fellowship (award No: 2021540).
–
Presenters
-
Ella A Carlander
University of Washington
Authors
-
Ella A Carlander
University of Washington
-
Uchenna D Nwaege
University of California, Riverside
-
Gian Marco Visani
University of Washington
-
Michael N Pun
University of Washington
-
Armita Nourmohammad
University of Washington