Entropy-Maximized Fingerprint Clustering for Building Robust Machine Learning Potentials of Carbon

ORAL

Abstract

Developing reliable machine learning potentials (MLPs) requires large and diverse training datasets, yet generating such datasets remains challenging. Conventional molecular dynamics (MD) often yields redundant configurations that leads to over-fitting. Recently, the synthesis of bulk hexagonal diamond highlights the need for accurate MLPs capable of capturing the rich bonding diversity within the carbon family. We propose a data-driven pipeline that integrates structural fingerprints with an entropy maximization scheme to construct a representative dataset for carbon. Candidate structures are generated via MD, with fingerprint descriptors used to quantify structural similarity. Clustering identifies representative configurations, and entropy maximization introduces additional diverse structures. This pipeline enhances structural diversity and improves the accuracy and transfer ability of the resulting potential across multiple carbon phases. The methodology is general and can be extended to other material systems.

Presenters

  • Meiyan Wang

    • Rutger University-Newark

Authors

  • Meiyan Wang

    • Rutger University-Newark
  • Li Zhu

    • Rutgers University - Newark
  • Rishi Rao

    • Rutgers University - Newark