Interpretable prediction of multicomponent IDR phase separation with quantitative accuracy
ORAL
Abstract
Cells contain multiple biomolecular condensates with distinct compositions that compartmentalize diverse biochemical processes. Accordingly, accurate modeling of the multicomponent phase behavior of proteins with intrinsically disordered regions (IDRs) is essential for elucidating the compositional specificity of condensates, whose formation is largely driven by IDR–IDR interactions. However, in multicomponent systems, thermodynamic observables are emergent properties of mixtures of multiple IDR sequences and exhibit strong dependencies on IDR concentrations, rendering accurate prediction particularly challenging. To address this, we established a workflow that integrates high-throughput molecular dynamics (MD) simulations with machine learning (ML) methods to decipher multicomponent IDR interactions. Our approach encodes copolymer sequences into a shared low-dimensional latent space that captures sequence features governing IDR phase separation. From this latent representation, a decoder predicts concentration-dependent thermodynamic quantities for an IDR mixture. We demonstrate that this framework generalizes to IDR segments from the human proteome and achieves quantitative accuracy in phase-diagram prediction. We further show that Euclidean distances between sequence embeddings in the latent space directly report chemical-potential differences between sequences in specified chemical environments. Finally, by systematically adjusting the complexity of the decoder, we provide mechanistic insights into IDR interactions in multicomponent mixtures. Collectively, these results establish our method as a practical tool for accurate prediction and design of multicomponent IDR phase separation, as well as for the identification of molecular determinants of IDR interaction specificity.
*NIH R35GM155017
–
Presenters
-
Zhuang Liu
- Princeton University