Universality of LLM mechanisms across scale and diversity

ORAL

Abstract

Large Language Models (LLMs) have become ubiquitous both in their use in everyday life and as a subject of scientific experiments and theory regarding their capabilities and learning mechanisms. However, it remains to be shown if the mechanisms of LLMs are universal, both across data diversity and model scale and initialization. We train our own 1.7B parameter LLM on state-of-the-art data used in performant models and also train variants that differ only by random seed of model initialization, data ordering, and different subsets of data. We show with information theoretic metrics that these different model variants have different output distributions, but over training time become more similar. Our results quantify when task-specific abilities emerge in training, examine the universality of these abilities, and contribute to reproducibility in AI research.

*L.M.S. is supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-2039656.

Presenters

  • Lindsay Maleckar Smith

    • Princeton University

Authors

  • Lindsay Maleckar Smith

    • Princeton University
  • Gautam Reddy

    • Princeton University
  • David J Schwab

    • The Graduate Center, City University of New York