Towards measuring generalization performance of deep neural networks via the Fisher information matrix
ORAL
Abstract
The problem of generalization in deep neural networks (DNNs) with many parameters is still not well-understood. In particular, there is clear empirical evidence that DNNs generalize well even in the overparameterized regime, where the networks have many more parameters than there are training examples, and generically do not overfit the training data. While some measures, such as flatness of the minimum found by the optimizer (Jiang et. al. 2019), have been shown empirically to correlate well with the generalization ability of a model, these measures often work only in particular regimes (Kaur et. al. 2022). Here, we aim to construct a generalization measure based on the Fisher information matrix of a model, which we show is computationally tractable to compute for large models. We provide theoretically-motivated intuition for why our Fisher-based measure should be predictive of generalization. Further, we investigate the performance of our measure in various settings and across choices of hyperparameters, and compare its performance to traditional generalization measures such as flatness of the loss function. We show that our measure predicts generalization performance across a range of settings.
* CWG and DJS were supported by the NSF through the CPBF (PHY-1734030). DJS was also supported by a Simons Fellowship in MMLS and a Sloan Foundation Fellowship.
–
Presenters
-
Chase W Goddard
Princeton University
Authors
-
Chase W Goddard
Princeton University
-
David J Schwab
The Graduate Center, CUNY