Expert-Grounded Prompt Engineering for Extracting Lattice Constants of High Entropy Alloys from Scientific Publications using Large Language Models

Shunshun Liu; Talon Booth; Yangfeng Ji; Wesley Reinhart; Prasanna Balachandran

Expert-Grounded Prompt Engineering for Extracting Lattice Constants of High Entropy Alloys from Scientific Publications using Large Language Models

Oral-In-person

Abstract

Automated extraction of data from experimental materials science literature using large language models (LLMs) offers an efficient alternative to manual approach. While LLMs can extract materials data, existing prompt engineering methods require human refinement, limiting generalizability across publications. We developed an automated prompt optimization workflow that uses gradient-based optimization with expert annotations to systematically refine prompts without human-in-the-loop refinement. This approach was applied to retrieve lattice constants from over 2,273 publications on high entropy alloys (HEAs), successfully extracting data for 1,870 compositions. The workflow achieved precision and recall exceeding 0.90 for as-cast, single-phase body-centered cubic HEAs, allowing the training of machine learning models to predict the lattice constants from curated data. Our model showed good predictive performance, which allowed us to use its outcome to reliabliy predict yield strength of HEAs. Our analysis revealed three LLM limitations: contextual hallucination, semantic misinterpretation and unit conversion errors emphasizing the need for validation protocols. This work demonstrates that using expert-annotated datasets to guide automated prompt optimization improves the reliability of large-scale data extraction.

March 17, 2026, 9:12 AM – March 17, 2026, 9:24 AM

Presenters

Shunshun Liu
- University of Virginia

Authors

Shunshun Liu
- University of Virginia
Talon Booth
Yangfeng Ji
Wesley Reinhart
- Pennsylvania State University
Prasanna Balachandran