The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold
ORAL · Invited
Abstract
We study the details of this manifold and find that networks with different architectures follow distinguishable trajectories, while other factors have minimal influence: larger networks train along a similar manifold as smaller networks, just faster; and networks initialized at very different points in the prediction space converge to solutions along a similar manifold. We analytically predict this phenomenon for linear networks, showing that it critically depends on the structure of the task.
This work was conducted in collaboration with Pratik Chaudhari (University of Pennsylvania), Jialin Mao (University of Pennsylvania), Rahul Ramesh (University of Pennsylvania), Rubing Yang (University of Pennsylvania), Mark Transtrum (Brigham Young University), Han Kheng Teoh (Cornell University), and James P. Sethna (Cornell University).
*I.G. acknowledges support by the NSF (DMREF-89228, EFRI-1935252) and Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship. This work was further supported by grants from the NSF (IIS-2145164, CCF-2212519, DMR-1719490, DMR-1753357), the Office of Naval Research (N00014- 22-1-2255), the NIH (1R01NS116595-01), and cloud computing credits from Amazon Web Services.
–
Publication: 1. Mao, J., Griniasty, I., Teoh, H.K., Ramesh, R., Yang, R., Transtrum, M.K., Sethna, J.P. and Chaudhari, P., 2024. The training process of many deep networks explores the same low-dimensional manifold. Proceedings of the National Academy of Sciences, 121(12), p.e2310002121.
2. Ramesh, R., Mao, J., Griniasty, I., Yang, R., Teoh, H.K., Transtrum, M., Sethna, J.P. and Chaudhari, P., 2023. A picture of the space of typical learnable tasks. Proc. of International Conference of Machine Learning (ICML).
Presenters
-
Itay Griniasty
- Cornell University