Identifying authenticity in student writing using text embeddings

ORAL

Abstract

The adoption of machine learning and Large Language Models (LLMs) is a new and exciting field within Physics Education Research. In our introductory physics lab courses, hundreds of students write memos as a final assignment, sharing information about their experiences and projects. To analyze authenticity, we then code their memos - characterizing them by the themes present - using a previously developed codebook. This coding is done manually, and coding it all by hand would require a large amount of time and energy. To allow us to complete faster and larger analyses of data more efficiently, we have been building on text embedding methods developed by Odden et al. Text embeddings take text as an input, and using a pre-trained LLM, return a high-dimensional 'meaning' vector to represent the themes present in the text. I will discuss the novel methods we are exploring using text embeddings to automate the sorting of longer student reflections into specific codes, improve speed, and assess accuracy with hand-coded data. The adoption of these new methods would allow for faster, more efficient, and more consistent thematic coding for a broader subset of researchers.

*NSF PHY#2310035 and Cornell University Nexus Scholars Program

Presenters

  • Carlos Kaskoun

    • Cornell University

Authors

  • Carlos Kaskoun

    • Cornell University
  • Mike Verostek

    • University of Rochester
    • Cornell University
  • Natasha G Holmes

    • Cornell University