Large Language Models Think in Curved Space

ORAL

Abstract

Research in the machine learning interpretability community has previously identified low dimensional stochastic trajectories inside modern large language models (LLMs). Such trajectories are termed Lines of Thoughts (LoT) [1], since they are conjectured to carry important clues about the way attention neural networks transform and integrate information. In this work, we analyze the LoT of various LLMs and show that LLMs progress through distinct learning stages. Finally, we discuss the ramification of our analysis on the trainedness, robustness, and steerability of LLMs.

[1]: Sarfati, R., Liu, T. J., Boullé, N., & Earls, C. J. (2024). Lines of thought in large language models. arXiv preprint arXiv:2410.01545.

Presenters

  • Toni Jianbang Liu

    • Cornell University

Authors

  • Toni Jianbang Liu

    • Cornell University
  • Raphael Sarfati

    • University of Colorado, Boulder
  • Nicolas Boulle

    • Imperial College London
  • Christopher Earls

    • Cornell University