Understanding Neural Network Generalizability from the Perspective of Entropy
ORAL
Abstract
Neural Networks (NNs) have shown remarkable success in solving a wide range of machine learning problems, ranging from image recognition to natural-language conversation (e.g., Chat-GPT). The generalizability of NNs, which measures NNs' ability to perform well on data unseen from training, is a critical evaluation metric for their usefulness in the real-world applications. However, the underlying mechanism that determines NNs' varying degrees of generalizability remains an open question. While it has been long suspected in the machine learning community that NNs at a flatter minimum of the loss-function landscape tend to generalize better, recent works suggest that the flatness itself may not be sufficient to determine the generalizability. In statistical physics, the flatness of a minimum in an energy landscape can be quantified by entropy. Therefore, we calculated the entropy of the loss-function landscape of NNs using Wang-Landau molecular dynamics and explored the potential correlation between such entropy and NNs' generalizability. By testing in both synthetic and real-world datasets, we found that the entropy-equilibrium state is better or at least comparable to the state reachable via the classical training optimizer (e.g., stochastic gradient descent).
* This work is partially supported by the Rudy Ruggles Interdisciplinary Research Award from the WCSU Chapter of Sigma-XiThis work used Brides-2 GPU at Pittsburgh Supercomputing Center through allocation CIS230096 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program.
–
Publication: Preprint: Correlation between entropy and generalizability in a neural network
https://arxiv.org/abs/2207.01996
Presenters
-
Entao Yang
Air Liquide
Authors
-
Entao Yang
Air Liquide
-
Xiaotian Zhang
City University of Hong Kong
-
Ge Zhang
City University of Hong Kong