Active Learning–Driven Design of Disordered Proteins Across Coarse-Grained Models

ORAL

Abstract

Designing protein-based materials requires navigating an immense sequence space with complex, non-intuitive structure–function relationships. For intrinsically disordered proteins (IDPs), which lack stable tertiary structure, coarse-grained (CG) models provide a tractable route to explore how sequence encodes thermodynamic and dynamic behavior. Yet these models differ in their parameterizations, raising questions about their consistency and transferability. We present a machine-learning-driven framework that couples Bayesian optimization with CG simulations to accelerate IDP sequence design across multiple models (HPS-Urry, Mpipi, Calvados). By jointly optimizing diffusivity and expenditure density, proxies for dynamics and phase-separation propensity, our active-learning pipeline maps the frontier of achievable sequence behavior in each model. Comparing optimization trajectories across models exposes their distinct inductive biases and the extent to which learned design rules generalize. This framework establishes a computational testbed for assessing cross-model transferability and guiding data-efficient IDP design.



Presenters

  • Zachary G Lipel

    • Princeton University

Authors

  • Zachary G Lipel

    • Princeton University
  • Wesley Oliver

    • Princeton University
  • Michael A. Webb

    • Princeton University