Expert-Grounded Prompt Engineering for Extracting Lattice Constants of High Entropy Alloys from Scientific Publications using Large Language Models
ORAL
Abstract
Automated extraction of data from experimental materials science literature using large language models (LLMs) offers an efficient alternative to manual approach. While LLMs can extract materials data, existing prompt engineering methods require human refinement, limiting generalizability across publications. We developed an automated prompt optimization workflow that uses gradient-based optimization with expert annotations to systematically refine prompts without human-in-the-loop refinement. This approach was applied to retrieve lattice constants from over 2,273 publications on high entropy alloys (HEAs), successfully extracting data for 1,870 compositions. The workflow achieved precision and recall exceeding 0.90 for as-cast, single-phase body-centered cubic HEAs, allowing the training of machine learning models to predict the lattice constants from curated data. Our model showed good predictive performance, which allowed us to use its outcome to reliabliy predict yield strength of HEAs. Our analysis revealed three LLM limitations: contextual hallucination, semantic misinterpretation and unit conversion errors emphasizing the need for validation protocols. This work demonstrates that using expert-annotated datasets to guide automated prompt optimization improves the reliability of large-scale data extraction.
*University of Virginia Research Innovation Award, University of Virginia Research Interest Group initiative
–
Presenters
-
Shunshun Liu
- University of Virginia