The Onset of Variance-Limited Behavior for Neural Networks at Finite Width and Sample Size.

ORAL

Abstract

For small training set sizes, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network. However, at a training set size, the finite-width network generalization begins to worsen compared to the infinite width performance. We empirically study the transition from the infinite width behavior to this variance-limited regime as a function of training set size and network width and network initialization scale. We find that finite size effects can become relevant for very small dataset sizes going as the square root of the width for polynomial regression with ReLU networks. We discuss the source of this finite size behavior based on the variance of the NN's final neural tangent kernel (NTK). Using this, we provide a toy model which also exhibits the same scaling and has sample-size dependent benefits from feature learning.

*This work was supported by by NSF grant DMS-2134157. AA acknowledges support from the NDSEG Fellowship and a Hertz Fellowship.

Publication: Planned paper "The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes".

Presenters

  • Alexander B Atanasov

    • Harvard University

Authors

  • Alexander B Atanasov

    • Harvard University
  • Cengiz Pehlevan

    • Harvard University
  • Blake Bordelon

    • Harvard University
  • Sabarish Sainathan

    • Harvard University