Same features, different encodings: three case studies of path dependence in grokking and learning.

ORAL

Abstract

Neural network training is a complicated dynamical process. Whether or not the outcome of training depends upon the learning path has deep implications for how we can understand and use neural networks. Two extremes are grokking – where a network learns after a long period of overtraining - and “steady” learning, where the training and test loss improve together. We investigate three simple tasks in which we induce both learning paths: classifying phases of the Ising model from snapshots, the modular addition problem in which grokking was first discovered, and the benchmark MNIST task.

Using techniques from interpretability and information geometry, we systematically contrast the features, encodings, and trajectories of grokking and "steady" learning. First, we find that the features learned in our example problems are the same in both paths. The features of the network trained on Ising phases in particular are very clear – the model learns to calculate the energy of a snapshot. Second, although the features are the same for both grokking and learning, the efficiency of their encodings can be dramatically different – by up to an order of magnitude. Finally, we show that the accuracy plateau in grokking is typically associated with exponential decay of the weights in the number of epochs, and that the grokking time appears to exhibit power law scaling across more than four decades of weight decay.

*This work made use of the Illinois Campus Cluster, a computing resource that is operated by the Illinois Campus Cluster Program (ICCP) in conjunction with the National Center for Supercomputing Applications (NCSA) and which is supported by funds from the University of Illinois at Urbana-Champaign.

Presenters

  • Dmitry Manning-Coe

    • University of Illinois at Urbana-Champaign

Authors

  • Dmitry Manning-Coe

    • University of Illinois at Urbana-Champaign
  • Jacopo Gliozzi

    • University of Illinois at Urbana-Champaign
  • Alexander G Stapleton

    • Queen Mary University of London
  • Edward Hirst

    • Queen Mary University of London
  • Marc Klinger

    • University of Illinois at Urbana-Champaign
  • Guiseppe de Tomasi

    • University of Illinois Urbana-Champaign
  • David S Berman

    • Queen Mary University of London