Energy-Based Models Capture Pairwise and Higher-Order Interactions in Protein Sequence Data
ORAL
Abstract
Understanding protein structure, evolution and function requires reliable inference of interacting units in folded proteins. Here we present a unifying approach for inferring two of the most important structural units of proteins: pairwise contacts, and higher-order strongly correlated units, known as sectors. Our method is a hybrid energy-based model, combining a pairwise-energy term, as used in state-of-the-art Direct Coupling Analysis, and a Restricted Boltzmann Machine (RBM) term, meant to capture higher order interactions. We show that, when trained on data from a biologically-informed ground truth model, our algorithms can learn both the pairwise and higher-order structure and are robust to varying levels of undersampling and strength of interactions in the ground truth distribution. We carry out the analysis for 2-spin and 10-spin systems with Minimum Probability Flow and Ratio Matching algorithms, respectively. We comment on why the RBM is successful at modeling the higher-order interactions and why certain choices for hyperparameters (number of hidden units in the RBM, regularization strength) lend themselves to the model's feature detection capabilities.
*Supported in part by the National Science Foundation, through the Center for the Physics of Biological Function (PHY-1734030), by the National Institutes of Health BRAIN initiative (R01EB026943-01), by the Simons Foundation, and by the Sloan Foundation.
–
Presenters
-
Peter Fields
- University of Chicago