Energy-Based Models Capture Pairwise and Higher-Order Interactions in Protein Sequence Data

ORAL

Abstract

Understanding protein structure, evolution and function requires reliable inference of interacting units in folded proteins. Here we present a unifying approach for inferring two of the most important structural units of proteins: pairwise contacts, and higher-order strongly correlated units, known as sectors. Our method is a hybrid energy-based model, combining a pairwise-energy term, as used in state-of-the-art Direct Coupling Analysis, and a Restricted Boltzmann Machine (RBM) term, meant to capture higher order interactions. We show that, when trained on data from a biologically-informed ground truth model, our algorithms can learn both the pairwise and higher-order structure and are robust to varying levels of undersampling and strength of interactions in the ground truth distribution. We carry out the analysis for 2-spin and 10-spin systems with Minimum Probability Flow and Ratio Matching algorithms, respectively. We comment on why the RBM is successful at modeling the higher-order interactions and why certain choices for hyperparameters (number of hidden units in the RBM, regularization strength) lend themselves to the model's feature detection capabilities.

Presenters

  • Peter Fields

    University of Chicago

Authors

  • Peter Fields

    University of Chicago

  • Vudtiwat Ngampruetikorn

    The Graduate Center, CUNY, The Graduate Center, City University of New York

  • Stephanie E Palmer

    University of Chicago

  • David J Schwab

    The Graduate Center, CUNY