Interpreting Transformers through the Lens of Physics and Dynamical Systems

POSTER

Abstract

Large Language Models (LLMs) such as Transformers have demonstrated remarkable linguistic and reasoning capabilities, yet their internal dynamics remain poorly understood. This work reviews recent efforts to interpret LLMs through the lens of physics and nonlinear dynamical systems. We explore how hidden states evolve as trajectories in high-dimensional phase space, governed by layer-wise transformations analogous to physical time evolution. Attention mechanisms are discussed as nonlocal field interactions, while loss landscapes resemble energy surfaces minimizing free energy. Concepts from thermodynamics—such as entropy, temperature, and criticality—offer insights into emergent behaviors, scaling laws, and phase transitions in model performance. By integrating principles from statistical mechanics, information geometry, and dynamical systems theory, this review aims to establish a unified framework for understanding language models as physical systems. Such perspectives may guide future research in interpretability, robustness, and efficient training dynamics.

Presenters

  • Tirtho Roy

    • Iowa State University

Authors

  • Tirtho Roy

    • Iowa State University
  • Ushashi Bhattacharjee

    • Iowa State University
  • Adrito Roy

    • Barishal Zilla School