UV/vis absorption spectra database auto-generated for optical applications via the Argonne data science program

ORAL

Abstract

A large corpus of material and experimental data exists in historic scientific literature. Natural language processing and data-mining approaches can be applied to curated scientific literature to extract chemical information for targeted functionality and applications. This talk focuses on development of a UV/vis absorption spectra database by means of a complex quantum chemistry workflow built on the top of the ChemDataExtractor tool [1] by leveraging DOE Argonne leadership computing facilities as a part of the data science project allocation. We show results of retrieving chemical information and experimental properties from a large sample of scientific literature (~ 400,000) with chemdataextractor. Some electronic structure properties of a large subset of compounds are modeled using quantum chemistry workflows for benchmarking and validation. Finally, the quality of the database is discussed based on validation metrics and its applicability to optical applications.

Presenters

  • Ganesh Sivaraman

    Argonne Leadership Computing Facility, Argonne National Laboratory

Authors

  • Edward J. Beard

    Cavendish Laboratory, Department of Physics, University of Cambridge

  • Ganesh Sivaraman

    Argonne Leadership Computing Facility, Argonne National Laboratory

  • Alvaro Vazquez-Mayagoitia

    Argonne National Lab, Argonne Leadership Computing Facility, Argonne National Laboratory, Argonne National Labs

  • Venkatram Vishwanath

    Argonne Leadership Computing Facility, Argonne National Laboratory

  • Jacqueline M Cole

    Cavendish Laboratory, Department of Physics, University of Cambridge