Large Language Models for Evaluating Evidence of Authenticity in Student Writing: A Preliminary Investigation of Retrieval-Augmented Generation Systems
ORAL
Abstract
Physics education research relies heavily on qualitative analysis of student responses to understand learning processes, but traditional human coding methods are time-intensive and limit the scale of research possible. This preliminary study investigates whether Large Language Models (LLMs) can accurately retrieve and apply thematic analysis frameworks to student responses using Retrieval-Augmented Generation (RAG). We developed a local Few-Shot Large Language model using Llama 3.1 integrated with RAG capabilities through langchain libraries. The system was tested on its ability to retrieve and apply thematic codes about authenticity to student reflections from introductory physics laboratories.
Current analysis showcases general, better than chance, agreement between human and LLM coding. However, inconsistencies emerged in complex thematic categorizations that required more nuanced interpretation. These preliminary findings suggest that RAG-enhanced LLMs show promise for scaling qualitative analysis in physics education research, although enhanced human feedback integration and iterative model refinement is needed before use in more generalized contexts.
Current analysis showcases general, better than chance, agreement between human and LLM coding. However, inconsistencies emerged in complex thematic categorizations that required more nuanced interpretation. These preliminary findings suggest that RAG-enhanced LLMs show promise for scaling qualitative analysis in physics education research, although enhanced human feedback integration and iterative model refinement is needed before use in more generalized contexts.
*NSF PHY#2310035 and Cornell University McNair Scholars Program
–
Presenters
-
Shaunjae M Suarez
- Cornell University