Energy-Based Models Capture Pairwise and Higher-Order Interactions in Protein Sequence Data

ORAL

Abstract

Understanding protein structure, evolution and function requires reliable inference of interacting units in folded proteins. Here we present a unifying approach for inferring two of the most important structural units of proteins: pairwise contacts, and higher-order strongly correlated units, known as sectors. Our method is a hybrid energy-based model, combining a pairwise-energy term, as used in state-of-the-art Direct Coupling Analysis, and a Restricted Boltzmann Machine (RBM) term, meant to capture higher order interactions. We show that, when trained on data from a biologically-informed ground truth model, our algorithms can learn both the pairwise and higher-order structure and are robust to varying levels of undersampling and strength of interactions in the ground truth distribution. We carry out the analysis for 2-spin and 10-spin systems with Minimum Probability Flow and Ratio Matching algorithms, respectively. We comment on why the RBM is successful at modeling the higher-order interactions and why certain choices for hyperparameters (number of hidden units in the RBM, regularization strength) lend themselves to the model's feature detection capabilities.

*Supported in part by the National Science Foundation, through the Center for the Physics of Biological Function (PHY-1734030), by the National Institutes of Health BRAIN initiative (R01EB026943-01), by the Simons Foundation, and by the Sloan Foundation.

Presenters

  • Peter Fields

    • University of Chicago

Authors

  • Peter Fields

    • University of Chicago
  • Vudtiwat Ngampruetikorn

    • The Graduate Center, CUNY
    • The Graduate Center, City University of New York
  • Stephanie E Palmer

    • University of Chicago
  • David J Schwab

    • The Graduate Center, CUNY