Unbiasing machine learning for molecular dynamics: emphasising out-of-equilibrium geometries using clustering

ORAL

Abstract

Machine learning (ML) force-fields (FF) became an increasingly popular tool in computational physics due to their speed and accuracy. By construction, ML models are often biased towards more abundant "close-to-equilibrium" states. A small mean error does not guarantee accurate prediction for rare "out-of-equilibrium" configurations, which are typically underrepresented in reference datasets.

We propose a method to train unbiased ML FF, which leads to equally accurate predictions independently of the density of training data. To achieve this, we divide datasets into smaller subsets (clusters) based on data similarities. Then, the quality of a ML model is evaluated for each individual cluster, thereby revealing problematic cases. Representative data for each problematic cluster is added to the training set, and the ML model is retrained. The improved learning process results in a flattening of the prediction errors throughout the reference data. The method is applied to molecular trajectory datasets, decreasing the largest errors of the obtained ML FF up to an order of magnitude.

Presenters

  • Grégory Cordeiro Fonseca

    University of Luxembourg

Authors

  • Grégory Cordeiro Fonseca

    University of Luxembourg

  • Igor Poltavskyi

    University of Luxembourg Limpertsberg, University of Luxembourg

  • Alexandre Tkatchenko

    Physics and Materials Science Reasearch Unit, University of Luxembourg, Physics and Materials Science Research Unit, University of Luxembourg, University of Luxembourg, University of Luxembourg Limpertsberg