Grokking and emergent capabilities in deep learning

ORAL · Invited

Abstract

Over the last few years deep neural networks have been trained on more data and using more compute. The performance, as measured by different metrics, exhibits two types of behavior. First type is continuous improvements with more data and more compute. This improvement follows predictable power laws. Second type of behavior is more mysterious: the performance on certain tasks can suddenly and sharply increase as a function of the amount of data or compute. This behavior is both unpredictable and uncontrollable. This effect is known under the names of "emergent capabilities" and "grokking".

In this talk I will describe a few simple models based on neural networks learning simple mathematical operations that elucidate this behavior.

* NSF CAREER Award DMR-2045181Sloan FoundationLaboratory for Physical Sciences through the Condensed Matter Theory Center

Publication: https://arxiv.org/pdf/2301.02679.pdf

Presenters

  • Andrey Gromov

    University of Maryland, College Park

Authors

  • Andrey Gromov

    University of Maryland, College Park