Universality of LLM mechanisms across scale and diversity
ORAL
Abstract
Large Language Models (LLMs) have become ubiquitous both in their use in everyday life and as a subject of scientific experiments and theory regarding their capabilities and learning mechanisms. However, it remains to be shown if the mechanisms of LLMs are universal, both across data diversity and model scale and initialization. We train our own 1.7B parameter LLM on state-of-the-art data used in performant models and also train variants that differ only by random seed of model initialization, data ordering, and different subsets of data. We show with information theoretic metrics that these different model variants have different output distributions, but over training time become more similar. Our results quantify when task-specific abilities emerge in training, examine the universality of these abilities, and contribute to reproducibility in AI research.
*L.M.S. is supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-2039656.
–
Presenters
-
Lindsay Maleckar Smith
- Princeton University