Bounds on learning with power-law priors

Sean A Ridout; Ilya M Nemenman; Ard A Louis; Chris Mingard; Radosław Grabarczyk; Kamaludin Dingle; Guillermo Valle Pérez; Charles London

Bounds on learning with power-law priors

ORAL

Abstract

Modern machine-leaning architectures often achieve good generalization despite having enough parameters to express any function on the training data. This is surprising, since such flexibility suggests they should "overfit'' and generalize poorly. In order to generalize well in the regime where any function can be expressed, a learning machine must have a good "inductive bias'': although any function may be expressed, some must be strongly disfavored. We study the inductive biases of many expressive classifiers through the distribution of functions produced by random parameter values, a proxy for their induced Bayesian priors and the corresponding inductive bias. These experiments reveal a universal power-law, "Zipfian'' prior in the space of functions. Here we rationalize the universality of this prior by studying the implications of power-law tails in the prior for Bayesian learning in the overparametrized regime. We show that any tail broader than Zipfian implies that a learning machine will fail to generalize on unseen data, while a narrower tail limits the number of functions that can be learned. This implies that the type of prior distribution seen in commonly-used learning machines is the only type of prior which can allow successful learning in the overparameterized regime.

^* This work was supported in part by NSF, NIH, and the Simons Foundation

March 7, 2024, 2:18 PM – March 7, 2024, 2:30 PM

Presenters

Sean A Ridout

Emory University

Authors

Sean A Ridout

Emory University
Ilya M Nemenman

Emory, Emory University
Ard A Louis

University of Oxford
Chris Mingard

University of Oxford
Radosław Grabarczyk

University of Oxford
Kamaludin Dingle

Gulf University for Science & Technology
Guillermo Valle Pérez

University of Oxford
Charles London

University of Oxford