Datasets of Unusual Size: Benchmark Databases of Non-Covalent Interaction Energies of CCSD(T)/CBS Accuracy
ORAL
Abstract
We present a new benchmark collection, DES360K, containing interaction energies for 366,117 dimer geometries computed using coupled-cluster with single, double, and perturbative triple excitations [CCSD(T)] extrapolated to the complete basis set (CBS) limit, considered the “gold standard” of quantum chemistry. Our collection spans 392 unique small molecules in 3,697 combinations, and explores both energetically minimized dimers and dimers extracted from molecular dynamics simulations. We extend our dataset using SNS-MP2-Ext, a neural network-based method for predicting CCSD(T) interaction energies from MP2 inputs trained on the DES360K dataset. SNS-MP2-Ext revises our original SNS-MP2 approach to be extensive by construction, resulting in significantly reduced prediction errors and narrower confidence intervals than the original SNS-MP2. More importantly, SNS-MP2-Ext eliminates the large variations in predictive performance shown in SNS-MP2 for different chemical classes and energy scales. We have used both SNS-MP2 and SNS-MP2-Ext to expand our dataset to contain another 5 million unique data points with minimal computational expense while maintaining CCSD(T) levels of accuracy. The resulting collection, DES5M, is the largest-ever available collection of gold standard data.
–
Presenters
-
Elizabeth Decolvenaere
D. E. Shaw Research
Authors
-
Elizabeth Decolvenaere
D. E. Shaw Research
-
Robert T McGibbon
D. E. Shaw Research
-
Andrew Garvin Taube
D. E. Shaw Research
-
Alexander G Donchev
D. E. Shaw Research
-
John L Klepeis
D. E. Shaw Research
-
David E. Shaw
D. E. Shaw Research