Specialization-generalization transition in exemplar-based in-context learning
ORAL
Abstract
In-context learning (ICL) is a striking behavior seen in pretrained transformers that allows models to generalize to unseen tasks after seeing only a few examples. We investigate empirically the conditions necessary on the pretraining distribution for generalized ICL to emerge. A model that exhibits generalized ICL is able to generalize to new tasks outside of the pretraining task distribution, while a model exhibiting specialized ICL generalizes only to new tasks within the pretraining task distribution. Previous work has focused on the number of distinct tasks necessary in the pretraining distribution for the model to exhibit ICL (in any form) — here, we introduce another axis of task diversity, based on the similarity between tasks, to study the emergence of generalized ICL in transformers trained on linear functions. We find that as task diversity increases, transformers undergo a transition from specialized to generalized ICL. The role of task similarity and the number of distinct pretraining tasks in eliciting generalized ICL is examined through a phase diagram, which illustrates the conditions on the pretraining distribution necessary for generalized ICL to emerge. We also explore the nature of the solutions learned by the transformer on both sides of the transition. Further experiments show that such specialization-generalization transitions persist in more complex, nonlinear settings.
*VN acknowledges research funds from the University of Sydney. DJS was partially supported by a Simons Fellowship in the MMLS and a Sloan Fellowship in Physics. LMS is supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-2039656.
–
Presenters
-
Chase Waring Goddard
- Princeton University