Transferable diversity – a data-driven representation of chemical space

ORAL

Abstract

While transferability in general chemistry machine learning should benefit from diverse training data, a rigorous understanding of transferability together with its interplay with chemical representation remains an open problem. In this talk, I will introduce a transferability framework and apply it to a controllable data-driven model for developing density functional approximations (DFAs) [1]. This framework reveals that human intuition introduces chemical biases that can hamper the transferability of data-driven DFAs, and it allowed us to identify strategies for their elimination. I will also show that the uncritical use of large training sets can actually hinder the transferability of DFAs, in contradiction to typical "more is more" expectations. Finally, I will demonstrate how our transferability framework yields transferable diversity, a cornerstone principle for data curation for developing general-purpose machine learning models in chemistry.

* SV acknowledges funding from the SNSF Starting Grant project (TMSGI2 211246). TG was supported by an Australian Research Council (ARC) Discovery Project (DP200100033) and Future Fellowship (FT210100663). SD was supported by an Australian Research Council (ARC) Discovery Project (DP200100033) and by the Ministry of Education, Singapore, under its Research Centre of Excellence award to the Institute for Functional Intelligent Materials, with Project No. EDUNC33-18-279-V12. BC acknowledges research funding from Japan Society for the Promotion of Science (22H02080) and generous grants of computer time from the RIKEN Information Systems Division (Q23266), Japan.

Publication: Gould T, Chang B, Dale S., Vuckovic S: Transferable diversity – a data-driven representation of chemical space. ChemRxiv. Cambridge: Cambridge Open Engage; 2023; [https://chemrxiv.org/engage/chemrxiv/article-details/6511601aed7d0eccc32e3ace]

Presenters

  • Stefan Vuckovic

    University of Fribourg

Authors

  • Stefan Vuckovic

    University of Fribourg

  • Tim Gould

    Griffith University

  • Bun Chan

    Graduate School of Engineering, Nagasaki University, Bunkyo 1-14, Nagasaki 852-8521, Japan

  • Stephen G Dale

    Dalhousie Univ

  • Stephen G Dale

    Dalhousie Univ