Imbalance-Aware Small-Data Machine Learning for Dimensionality Prediction in Hybrid Metal Halides
ORAL
Abstract
In this work, we present an imbalance-aware, small-data machine-learning workflow to predict the structural dimensionality (0D–3D) of hybrid metal halides, one of the key factors governing exciton binding, charge transport, and environmental stability. Our model is first trained on a highly-imbalanced hybrid metal halide dataset from the HybriD3 database ( ≈ 67% 2D). Chemically informed descriptors capture steric and connectivity effects, while interaction-based descriptors further account for nonlinear relationships. We address data imbalance with synthetic oversampling and targeted feature engineering, followed by the development of stacked ensemble classifiers to improve accuracy. This approach enhances minority-class accuracy while maintaining overall performance. Five-fold cross-validation yields a mean F1 score of 0.964 with low variance, demonstrating strong generalization. The developed framework is interpretable, efficient, and applicable to other imbalanced, small-scale materials datasets.
*This work is supported by the National Science Foundation DMREF award numbers 2323546, 2538963, 2323547, and part of this work used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
–
Presenters
-
Mariia Karabin
- Oak Ridge National Laboratory
- Middle Tennessee State University