Understanding Neural Network Generalizability from the Perspective of Entropy

ORAL

Abstract

Neural Networks (NNs) have shown remarkable success in solving a wide range of machine learning problems, ranging from image recognition to natural-language conversation (e.g., Chat-GPT). The generalizability of NNs, which measures NNs' ability to perform well on data unseen from training, is a critical evaluation metric for their usefulness in the real-world applications. However, the underlying mechanism that determines NNs' varying degrees of generalizability remains an open question. While it has been long suspected in the machine learning community that NNs at a flatter minimum of the loss-function landscape tend to generalize better, recent works suggest that the flatness itself may not be sufficient to determine the generalizability. In statistical physics, the flatness of a minimum in an energy landscape can be quantified by entropy. Therefore, we calculated the entropy of the loss-function landscape of NNs using Wang-Landau molecular dynamics and explored the potential correlation between such entropy and NNs' generalizability. By testing in both synthetic and real-world datasets, we found that the entropy-equilibrium state is better or at least comparable to the state reachable via the classical training optimizer (e.g., stochastic gradient descent).

* This work is partially supported by the Rudy Ruggles Interdisciplinary Research Award from the WCSU Chapter of Sigma-XiThis work used Brides-2 GPU at Pittsburgh Supercomputing Center through allocation CIS230096 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program.

Publication: Preprint: Correlation between entropy and generalizability in a neural network
https://arxiv.org/abs/2207.01996

Presenters

  • Entao Yang

    Air Liquide

Authors

  • Entao Yang

    Air Liquide

  • Xiaotian Zhang

    City University of Hong Kong

  • Ge Zhang

    City University of Hong Kong