A Tale of Two Splines: Towards Next-Generation Density Functionals and Machine Learning for Chemistry

ORAL  · Invited

Abstract

I will present a framework that unites the two primary strategies for constructing density functional approximations (DFAs): non-empirical constraint satisfaction and empirical data-driven optimization. The proposed method employs B-splines, bell-shaped spline functions with compact support, to construct each inhomogeneity correction factor (ICF). This choice offers several distinct advantages over traditional polynomial expansions by enabling explicit enforcement of both linear and non-linear constraints as well as ICF smoothness using Tikhonov and penalized B-splines (P-splines) regularization. As proof-of-concept, we used the so-called Constrained And Smoothed Empirical (CASE) framework to construct a constraint-satisfying and data-driven global hybrid DFA that exhibits enhanced performance across a diverse set of chemical properties. We argue that the CASE approach can be used to generate next-generation DFAs that maintain the physical rigor and transferability of non-empirical DFAs while leveraging high-quality quantum-mechanical data to remove the arbitrariness of ansatz selection and improve performance.

I will also present a framework for transforming the information in semi-local descriptors of the electron density (i.e., ρ(r) and ∇ρ(r))—the quantum mechanical objects at the heart of Density Functional Theory (DFT)—into feature vectors for machine learning (ML). As proof-of-principle, we consider the task of learning molecular conformational energies and introduce the Semi-Local Density Fingerprint (SLDF) descriptor, which transforms the most energetically relevant information in ρ(r) into compact fixed-size feature vectors that are invariant to translations, rotations, and atomic permutations, unique (at least up to the DFT exchange-correlation energy), smooth with respect to atomic position changes, and exhibit enhanced transferability to systems with elements from wider swaths of the periodic table. We demonstrate that SLDF-based ML models predict conformational energies that are often >100x more accurate than semi-local DFT. The transferability of SLDF-based ML models is showcased by their ability to rectify the qualitatively incorrect semi-local DFT description of the oxirene potential energy surface without seeing a single oxirene conformation.

*RAD gratefully acknowledges financial support from: a Faculty Early Career Development (CAREER) Award from the National Science Foundation (CHE-1945676), a Machine Learning in the Chemical Sciences & Engineering Award from the Camille and Henry Dreyfus Foundation (ML-22-034), a Sloan Research Fellowship from the Alfred P. Sloan Foundation, and the Research Experience for Undergraduates program at the Cornell Center for Materials Research (DMR-1757420 and DMR-1719875). RAD also acknowledges computational resources provided by the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Publication: Z. M. Sparrow, B. G. Ernst, T. K. Quady, and R. A. DiStasio Jr., Uniting Non-Empirical and Empirical Density Functional Approximation Strategies Using Constraint-Based Regularization, J. Phys. Chem. Lett. 13, 6896-6904 (2022).

Z. Shen, Y. Yang, Z. M. Sparrow, B. G. Ernst, T. K. Quady, R. Kang, J. Lee, Y. Yang, L. Tu, and R. A. DiStasio Jr., Learning Molecular Conformational Energies Using Semi-Local Density Fingerprints, J. Phys. Chem. Lett. in press (2025).

Presenters

  • Robert A Distasio

    • Cornell Univeristy
    • Cornell University

Authors

  • Robert A Distasio

    • Cornell Univeristy
    • Cornell University