Abstract
It is well-known that the Hessian of deep loss landscape matters to optimization and generalization of deep learning. Previous studies reported a rough Hessian structure in deep learning, which consists of two components, a small number of large eigenvalues and a large number of nearly-zero eigenvalues. To the best of our knowledge, we are the first to report that a simple but overlooked power-law Hessian structure exists in well-trained deep neural networks, including Convolutional Neural Networks (CNNs) and Large Language Models (LLMs). Moreover, we provide a maximum-entropy theoretical interpretation for the power-law Hessian structure and theoretically demonstrate the existence of robust and low-dimensional subspace of deep neural networks. Our extensive experiments using the proposed power-law spectral method demonstrate that the power-law Hessian spectra critically relate to multiple important behaviors of deep learning, including optimization, generalization, and overparameterization. Notably, we discover that the power-law Hessian structure of a given LLM can effectively predict generalization during training, while conventional sharpness-based generalization measures that often works well on CNNs become nearly useless for as a generalization predictor of LLMs.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 42nd International Conference on Machine Learning, ICML 2025 |
| Publisher | ML Research Press |
| Pages | 58805-58831 |
| Number of pages | 27 |
| Publication status | Published - Jul 2025 |
| Event | 42nd International Conference on Machine Learning, ICML 2025 - Vancouver Convention Center, Vancouver, Canada Duration: 13 Jul 2025 → 19 Jul 2025 https://icml.cc/Conferences/2025 (Conference Website) https://icml.cc/virtual/2025/calendar (Conference Calendar) https://proceedings.mlr.press/v267/ (Conference Proceedings) |
Publication series
| Name | Proceedings of Machine Learning Research |
|---|---|
| Publisher | ML Research Press |
| Volume | 267 |
Conference
| Conference | 42nd International Conference on Machine Learning, ICML 2025 |
|---|---|
| Country/Territory | Canada |
| City | Vancouver |
| Period | 13/07/25 → 19/07/25 |
| Internet address |
|