OMDB-GAP1: A new dataset for band gap predictions for large organic crystal structures

ORAL

Abstract

Large datasets of ab initio calculations have enabled many pioneering studies of machine learning applied to quantum-chemical systems. For example, the machine learning models achieved chemical accuracy on the popular QM9 dataset which contains small organic molecules. Here, we present a new, more challenging dataset of 12,500 large organic crystal structures and their corresponding DFT band gap, freely available at https://omdb.diracmaterials.org/dataset. The dataset is based on the Organic Materials Database (OMDB) which hosts electronic properties of previously synthesized organic crystal structures. With an average of 85 atoms per unit cell, this dataset provides a new challenge for machine learning applications. We also evaluate the performance of two recent machine learning models on this new dataset: Kernel Ridge Regression with the Smooth Overlap of Atomic Positions (SOAP) and the deep learning model SchNet.

Presenters

  • Bart Olsthoorn

    NORDITA

Authors

  • Bart Olsthoorn

    NORDITA

  • Richard Geilhufe

    NORDITA, Nordic Institute for Theoretical Physics, Stockholm University and KTH, Stockholm, Sweden, Nordic Institute for Theoretical Physics, Stockholm University, KTH Royal Institute of Technology, NORDITA, Nordic Institute for Theoretical Physics, KTH Royal Institute of Technology, Stockholm University, Nordic Institute for Theoretical Physics

  • Stanislav Borysov

    Technical University of Denmark, DTU

  • Alexander Balatsky

    Nordita, Los Alamos National Laboratory, Nordic Institute for Theoretical Physics, Stockholm, Institute for Materials Science, Los Alamos National Laboratory, NORDITA, Nordic Institute for Theoretical Physics, Los Alamos National Laboratory, Institute for Materials Science, Institute for Material Science, Los Alamos National Laboratory, Department of Physics, University of Connecticut, Storrs, CT 06269, USA