Impact of Dataset Uncertainties on Machine Learning Model predictions: The Example of Polymer Glass Transition Temperatures

ORAL

Abstract

Data-driven methods are seeing a revival and are deeply influencing multiple aspects of materials research. Materials property data from computations or experiments, are being utilized to create surrogate models using machine learning (ML) techniques. These models can be utilized to provide rapid predictions of the properties of new materials at a fraction of the cost compared to actual experimentation or computation. Moreover, a variety of techniques are being explored to “invert” the property prediction pipeline to allow for designing materials with desired target set of property values. The quality of the developed surrogate model, depends on the quality (and quantity) of the dataset used in the model training. Often, different experimental studies may report different values for the same property of the same material. This may be due to variations in measurement techniques, conditions, and sample quality among others. How should one treat these variances and what is their impact ? This question needs to be answered specifically, since it is paramount to the development of a good prediction model and helps understand its limitations.

Presenters

  • Anurag Jha

    Georgia Institute of Technology

Authors

  • Anurag Jha

    Georgia Institute of Technology

  • Anand Chandrasekaran

    Georgia Institute of Technology

  • Chiho Kim

    Georgia Institute of Technology

  • Ramamurthy Ramprasad

    Georgia Institute of Technology, University of Connecticut, School of Materials Science and Engineering, Georgia Institute of Technology, Materials Science and Engineering, Georgia Institute of Technology, School of Materials Science and Engineering, Georgia Institute of Techmology