On the Effect of Training Data on Machine Learning Phonon Dispersion
ORAL
Abstract
Recent advances of machine learning interatomic potential (MLIP) architectures have improved both the accuracy and scalability of energy and force predictions in chemical systems for many practical applications. Less attention was paid on the data and the pipeline used to train MLIP models, specifically on what aspects of the training data (generated through ab initio simulations) and pipeline impact the predictions of quantities of interest by how much. Here, taking diamond phonon dispersion prediction as a case study, we present examples of how different properties of the training dataset and the training pipeline affect phonon dispersion predictions. Specifically, we point out the roles of planewave cutoff, simulation cell size, training dataset size, and the random seed for model parameter initialization. Potential strategies to mitigate these sources of variation are also discussed.
*This research is supported by the Computational Materials Science program of the U.S. Department of Energy, Office of Science, Basic Energy Sciences under Award DE-SC0020129.Computational resources were provided by the National Energy Research Scientific Computing Center (a DOE Office of Science User Facility supported under Contract No. DE-AC02-05CH11231), the Argonne Leadership Computing Facility (a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357), and the Texas Advanced Computing Center (TACC) at The University of Texas at Austin.
–
Presenters
-
Jaesuk Park
- University of Texas at Austin