The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold

Itay Griniasty

The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold

ORAL · Invited

Abstract

Deep neural networks are capable of exploring arbitrary hypotheses. However, we demonstrate that when paired with a concrete task and algorithm, the training of deep networks explores only a tiny fraction of the space of available hypotheses. To visualize and identify this space, we developed an information-geometric lens. Focusing this lens on our experimental data uncovers that networks with many different architectures, trained with different optimization procedures and regularization techniques, traverse the same manifold. Moreover, networks trained on different tasks also lie on a low-dimensional manifold.

We study the details of this manifold and find that networks with different architectures follow distinguishable trajectories, while other factors have minimal influence: larger networks train along a similar manifold as smaller networks, just faster; and networks initialized at very different points in the prediction space converge to solutions along a similar manifold. We analytically predict this phenomenon for linear networks, showing that it critically depends on the structure of the task.

This work was conducted in collaboration with Pratik Chaudhari (University of Pennsylvania), Jialin Mao (University of Pennsylvania), Rahul Ramesh (University of Pennsylvania), Rubing Yang (University of Pennsylvania), Mark Transtrum (Brigham Young University), Han Kheng Teoh (Cornell University), and James P. Sethna (Cornell University).

^*I.G. acknowledges support by the NSF (DMREF-89228, EFRI-1935252) and Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship. This work was further supported by grants from the NSF (IIS-2145164, CCF-2212519, DMR-1719490, DMR-1753357), the Office of Naval Research (N00014- 22-1-2255), the NIH (1R01NS116595-01), and cloud computing credits from Amazon Web Services.

March 18, 2025, 9:12 AM – March 18, 2025, 9:48 AM

Publication: 1. Mao, J., Griniasty, I., Teoh, H.K., Ramesh, R., Yang, R., Transtrum, M.K., Sethna, J.P. and Chaudhari, P., 2024. The training process of many deep networks explores the same low-dimensional manifold. Proceedings of the National Academy of Sciences, 121(12), p.e2310002121.
2. Ramesh, R., Mao, J., Griniasty, I., Yang, R., Teoh, H.K., Transtrum, M., Sethna, J.P. and Chaudhari, P., 2023. A picture of the space of typical learnable tasks. Proc. of International Conference of Machine Learning (ICML).

Presenters

Itay Griniasty
- Cornell University

Authors

Itay Griniasty
- Cornell University