G-BigSMILES 2.0: A Generative Representation for Complex Polymer Architectures and Machine Learning

Yuan Tian; Gervasio Zaldivar; Ge Sun; Juan de Pablo

G-BigSMILES 2.0: A Generative Representation for Complex Polymer Architectures and Machine Learning

Oral-In-person

Abstract

A fundamental challenge in polymer informatics is the stochastic nature of polymer ensembles. Standard line notations, like BigSMILES, are deterministic and fail to capture the structural probability distributions from synthesis. Our prior work, G-BigSMILES, addressed this by extending the BigSMILES notation with ensemble descriptors, such as molecular-weight distributions, reactivity ratios, and probability-weighted bonds, to enable generative sampling and graph representations of the complete ensemble. Building on this framework, we introduce G-BigSMILES 2.0, a backward-compatible extension incorporating simplified end-group treatment, support for multiple bond descriptors per atom, and nested stochastic objects to model multistep polymerization. These enhancements enable the concise encoding of complex architectures, including multiblock, bottlebrush, star, and hyperbranched polymers with tunable control over composition. A single G-BigSMILES 2.0 string generates simulation-ready molecular ensembles and a generative graph for graph neural networks. We demonstrate this by recovering target statistics and integrating into automated pipelines for database building, molecular dynamics, and machine learning-driven screening. G-BigSMILES 2.0 thus transforms a descriptive notation into a design-ready specification for reproducible, informatics-driven polymer discovery.

March 16, 2026, 5:42 PM – March 16, 2026, 5:54 PM

Presenters

Yuan Tian
- New York University

Authors

Yuan Tian
- New York University
Gervasio Zaldivar
Ge Sun
- New York University
Juan de Pablo