Symmetry-informed Machine Learning for Deciphering Multicomponent Intrinsically Disordered Protein (IDP) Interactions
ORAL
Abstract
It is widely observed that intrinsically disordered proteins (IDPs) can drive condensate formation in the cell, but it remains a challenge to predict phase behaviors from IDP sequences. Previous work has either employed physics-based mean-field models that lack quantitative accuracy or large ML classifiers that accurately predict whether a condensate forms but give little physical insight. We propose a method to balance this accuracy-interpretability tradeoff by building a maximally expressive model that still obeys physical laws. The key advance is that we do not predict phase diagrams directly, but rather we build a model to predict free energies of IDP mixtures by training on an accessible thermodynamic quantity, in this case the equation of state (osmotic pressure). Our model consists of an encoder that maps each sequence onto a low-dimensional feature space, and a decoder that learns thermodynamic interactions and predicts the equation of state of an IDP mixture with different features and concentrations. Importantly, the encoder encodes a sequence independently of the mixture in which it is found. This greatly reduces the number of parameters in the model and allows training with sparse data. Moreover, we require the decoder to obey the physical symmetries of a free energy landscape. These symmetries not only enable generalization of the model to mixtures with any number of species, but also improve interpretability of the model, as the chemical potential of a sequence in a mixture is always a linear projection of its feature-space representation. We demonstrate that our model predicts pressure, free energy, and phase diagrams with great accuracy, and also provides insights into the physics behind rich IDP-mixture phase behaviors.
* NIH R35GM155017
–
Presenters
-
Beijia Yuan
- Princeton University