Diverse training data generation for machine-learning interatomic potentials
ORAL
Abstract
Machine-learning interatomic potentials (MLIAPs) make it feasible to aim for both accuracy and transferability, something that the earlier generations of potentials struggled to achieve. However, given their flexibility, these potentials often fail at extrapolating to properties beyond the training data, which makes the quality of the training set the determining factor in their performance. Training datasets typically consist of DFT energies and forces of relatively small systems, traditionally selected manually or randomly from subsets of the configuration space of interest. The need for human intervention in the curation of training sets makes their generation labor-intensive and time-consuming. Here, we present a generalization of a previously developed method based on the automated maximization of the information entropy of the descriptor distribution. The diversity of the entropy-optimized training dataset is compared to several other datasets from the literature. In addition, this method is applied to train MLIAPs for Be, W, and Re. W and Be are primary candidates for first wall material applications in fusion reactors, while Re is a product of transmutation of W due to neutron bombardment. The transferability of the MLIAPs trained using the entropy-optimized training data is compared to that of the traditional curation approaches, highlighting the desirable characteristics of an optimal training data, irrespective of the material chemistry.
–
Presenters
-
Aparna P. A. Subramanyam
Los Alamos National Laboratory
Authors
-
Aparna P. A. Subramanyam
Los Alamos National Laboratory
-
Danny Perez
Los Alamos Natl Lab, Los Alamos National Laboratory