G-BigSMILES 2.0: A Generative Representation for Complex Polymer Architectures and Machine Learning

ORAL

Abstract

A fundamental challenge in polymer informatics is the stochastic nature of polymer ensembles. Standard line notations, like BigSMILES, are deterministic and fail to capture the structural probability distributions from synthesis. Our prior work, G-BigSMILES, addressed this by extending the BigSMILES notation with ensemble descriptors, such as molecular-weight distributions, reactivity ratios, and probability-weighted bonds, to enable generative sampling and graph representations of the complete ensemble. Building on this framework, we introduce G-BigSMILES 2.0, a backward-compatible extension incorporating simplified end-group treatment, support for multiple bond descriptors per atom, and nested stochastic objects to model multistep polymerization. These enhancements enable the concise encoding of complex architectures, including multiblock, bottlebrush, star, and hyperbranched polymers with tunable control over composition. A single G-BigSMILES 2.0 string generates simulation-ready molecular ensembles and a generative graph for graph neural networks. We demonstrate this by recovering target statistics and integrating into automated pipelines for database building, molecular dynamics, and machine learning-driven screening. G-BigSMILES 2.0 thus transforms a descriptive notation into a design-ready specification for reproducible, informatics-driven polymer discovery.

Presenters

  • Yuan Tian

    • New York University
    • The University of Chicago
    • University of North Carolina at Chapel Hill

Authors

  • Yuan Tian

    • New York University
    • The University of Chicago
    • University of North Carolina at Chapel Hill
  • Gervasio Zaldivar

    • New York University
  • Ge Sun

    • New York University
  • Juan de Pablo

    • New York University
    • NYU