Similarity Metric for Data Optimization and Efficient Training of Reactive Machine Learning Force Fields

ORAL

Abstract

Machine learning force fields (MLFFs) are now widely used to investigate reactive processes with quantum mechanical accuracy. These models require training datasets that provide representative information on the atomic environments of chemical systems. A major challenge in dataset construction lies in generating structures that capture the highly complex and diverse nature of reaction chemistry while minimizing redundancy. A common approach to address this issue is to remove data with high structural similarity using metrics such as D-optimality. However, such methods may inadvertently eliminate structures containing valuable information when chemical reactions involve local changes in only a few atoms.

To overcome this limitation, we developed a dataset optimization method based on a cosine similarity metric that performs atom-by-atom comparisons and pinpoints important data points associated with rare or localized events. We apply this approach to a dataset for the radiolysis of polyethylene and train a ChIMES MLFF model. Our method reduces the dataset size by approximately 70% while maintaining accuracy across various molecular and polymeric systems, including simple alkanes, unsaturated carbon bonds, and radiolysis processes. Finally, we perform molecular dynamics simulations of radiolytic damage in large-scale systems to mitigate finite-size effects. Overall, our approach produces an MLFF that preserves accuracy while significantly improving computational efficiency.

*This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract number DEAC52-07NA27344.

Publication: Kwangnam Kim, Matthew P. Kroonblawd, and Nir Goldman, Similarity Metric for Data Optimization and Efficient Training of Reactive Machine Learning Force Fields for Hydrocarbon Radiolysis, Submitted.

Presenters

  • Kwangnam Kim

    • Lawrence Livermore National Laboratory

Authors

  • Kwangnam Kim

    • Lawrence Livermore National Laboratory
  • Matthew P Kroonblawd

    • Lawrence Livermore National Laboratory
  • Nir Goldman

    • Lawrence Livermore National Laboratory