Physics based machine learning decodes sequence-confirmation link in the disordered proteome

ORAL

Abstract

Over a third of the human proteome is disordered. These proteins (known as IDPs) interconvert between different conformations, unlike folded proteins. Predicting conformations of these dynamic proteins is needed to understand function, but is challenging due to the specialized nature of experiments. Machine learning (ML) models like AlphaFold struggle to predict their conformations due to limited data and the ensemble nature of IDPs. We integrate polymer-physics-based analytical theory, simulation, and ML to decode the link between disordered protein sequence and conformation. Starting with a coarse-grained, analytically tractable Hamiltonian, we model sequence-dependent electrostatics mathematically, while non-electrostatic interactions are extracted from simulations of many sequences and subsequently trained using machine learning. The resulting Hamiltonian—combining physics-based electrostatics patterning and machine-learned non-electrostatic patterning—can predict sequence-dependent conformational properties of IDPs. This physics-based machine learning (PML) yields models as accurate as simulation but with throughput comparable to ML. At the same time, it goes beyond traditional ML, allowing prediction of observables not included in the training data. Predictions made using this fast, high-throughput approach can help model phase separation, complexation between disordered proteins, and proteome conformations to understand evolution and design new proteins.

*This work is funded by NIH under Grant No. R01GM138901

Publication: Houston L, Phillips M, Torres A, Gaalswyk K, Ghosh K. Physics-Based Machine Learning Trains Hamiltonians and Decodes the Sequence-Conformation Relation in the Disordered Proteome. J Chem Theory Comput. 2024 Nov 26;20(22):10266-10274. doi: 10.1021/acs.jctc.4c01114. Epub 2024 Nov 6. PMID: 39504303; PMCID: PMC12257546.

Presenters

  • Lilianna Houston

    • University of Denver

Authors

  • Lilianna Houston

    • University of Denver
  • Kingshuk Ghosh

    • University of Denver