Picking a Paradigm: Insight vs. Active Machine Learning for NaCl Solvation in Water

ORAL

Abstract

It’s no secret that machine learning has exploded in popularity – for use in both the scientific disciplines as well as industry. With the explosion in these methods, separate groups approaching things with their own methodology have already yielded a new kind of “zoo” of options of machine learning frameworks. Several studies have shown the capabilities of these methodologies, demonstrating their usefulness for a broad range of systems in efficiently achieving reference-level accuracy.

Despite these achievements, universality of the trained models is often left as a second thought. This leaves others to test the transferability of published networks on their own systems of interest, often necessitating the creation of their own problem-specific network. The fundamental questions then become i) how can we maximize the representation in one's training data, and ii) how small of a training set can one use to still yield sufficient accuracy? Here, we frame these questions in the context of a choice between insight learning, where one needs to capture the behavior of transition state regions and must seed the training to roughly cover the bounds of the relevant potential energy surface, as opposed to active learning, where one regularly recalculates outliers in generated, small subsets as new reference targets during training.

* This work was funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, under Award No. DE-SC0019394, as part of the CCS Program.

Presenters

  • Alec Wills

    Stony Brook University

Authors

  • Alec Wills

    Stony Brook University

  • Marcio Sampaio

    Universidade Federal do ABC - Brazil

  • Helena Donaldson

    Stony Brook University

  • Luana Pedroza

    Universidade de São Paulo - Brazil, Universidade Federal do ABC - Brazil

  • Marivi Fernandez-Serra

    Stony Brook University