Labeling Local Atomic Environments for Reliable General-Purpose Machine Learning Force Fields

Anton Charkin-Gorbulin; Miguel Gallegos Gonzalez; Igor Poltavsky; Alexandre Tkatchenko

Labeling Local Atomic Environments for Reliable General-Purpose Machine Learning Force Fields

Oral-In-person

Abstract

General-purpose machine-learning force fields (GP-MLFFs) aim to deliver transferable models across elements, bonding types, and periodic or non-periodic systems, as exemplified by MACE-OFF¹, SO3LR², and Orb³. Their training depends on large, diverse datasets such as OMol25⁴, yet dataset diversity is usually characterized only at the system level, with no metric to assess the coverage of local atomic environments.

We introduce a universal labeling framework for atom-centered environments that encodes coordination motifs and functional-group identities into compact label IDs. These identifiers quantify dataset coverage, reveal blind spots and redundancies, and provide a principled basis for dataset cleaning and merging. Integrated with GP-MLFFs, the labeling enables rapid inference-time checks, flagging unseen environments prone to extrapolation. We show that label-guided training and inference of MACE-OFF and SO3LR models yield more robust, extrapolation-aware molecular dynamics. Overall, the developed framework offers a practical, interpretable, and architecture-agnostic diversity metric that serves both as a benchmark for dataset quality and a built-in safety layer for GP-MLFFs.

¹D. P. Kovács, et al., J. Am. Chem. Soc. 147, 17598 (2025).

²A. Kabylda, et al., J. Am. Chem. Soc. 147, 33723 (2025).

³M. Neumann, et al., arXiv:2410.22570 (2024).

⁴D. S. Levine, et al., arXiv:2505.08762 (2025).

March 15, 2026, 6:00 AM – March 15, 2026, 6:00 AM

Presenters

Anton Charkin-Gorbulin
- University of Luxembourg

Authors

Anton Charkin-Gorbulin
- University of Luxembourg
Miguel Gallegos Gonzalez
Igor Poltavsky
Alexandre Tkatchenko
- University of Luxembourg