Explaining Near-Zero Hessian Eigenvalues Through Approximate Symmetries in Neural Networks
ORAL
Abstract
The Hessian of the training loss encodes the local geometry of the loss landscape and its interplay with optimization, thereby shaping model design and generalization. Empirically, deep networks exhibit a spectrum with a few large outliers and a broad bulk of near-zero modes. We argue that the bulk structure reflects (approximate) dataset-independent symmetries of the parameterization. We first show that in deep fully connected linear networks, exact continuous symmetries – e.g., infinitesimal inter-layer rotations – generate flat directions and thus zero Hessian eigenvalues. Extending this, we propose that nonlinear networks inherit approximate versions of these symmetries, which are weakly broken by nonlinear activation functions, producing many small but finite eigenvalues. We demonstrate this mechanism in a two-layer ReLU student-teacher model and in a multi-layer network trained on CIFAR-10, where low-eigenvalue eigenvectors align with symmetry directions. Finally, we apply the same analysis to a convolutional architecture, indicating the generality of the symmetry-based explanation. Together, these results link the Hessian bulk to weakly broken symmetries and clarify the origin of outliers versus near-zero modes of the Hessian.
–
Presenters
-
Marcel Kühn
- Leipzig University