Data-Efficient AI Framework for Polymer Design through Hierarchical Representation
ORAL
Abstract
Designing polymers with targeted properties remains a long-standing challenge due to their vast chemical diversity and the scarcity of high-quality data. While AI has transformed molecular and materials design, progress in polymer science is often constrained by such data limitations and the inherent complexity of polymer architectures. In this work, we present a data-efficient machine learning framework that integrates coarse-grained functional-group representations with attention mechanisms to capture chemical context and hierarchical interactions within polymeric systems. By representing molecules as graphs of functional groups, the model leverages physical and chemical priors from group-contribution theory to learn meaningful, low-dimensional embeddings that substantially reduce training data requirements while preserving interpretability. It achieves high predictive accuracy across multiple thermophysical properties and can invert its latent space to generate chemically valid candidates with targeted characteristics. When integrated with G-BigSMILES, a generative grammar that encodes stochastic polymer ensembles and complex topologies, this framework forms a unified AI pipeline that bridges polymer representation, learning, and inverse design. This combined approach demonstrates how domain-informed AI can provide a scalable and interpretable route to polymer discovery under realistic data constraints.
–
Publication: arXiv:2502.00910
Presenters
-
Ge Sun
- New York University