Investigating the Overlooked Hessian Structure: From CNNs to LLMs

  • Qian Yuan Tang
  • , Yufei Gu
  • , Yunfeng Cai
  • , Mingming Sun
  • , Ping Li
  • , Xun Zhou
  • , Zeke Xie*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

It is well-known that the Hessian of deep loss landscape matters to optimization and generalization of deep learning. Previous studies reported a rough Hessian structure in deep learning, which consists of two components, a small number of large eigenvalues and a large number of nearly-zero eigenvalues. To the best of our knowledge, we are the first to report that a simple but overlooked power-law Hessian structure exists in well-trained deep neural networks, including Convolutional Neural Networks (CNNs) and Large Language Models (LLMs). Moreover, we provide a maximum-entropy theoretical interpretation for the power-law Hessian structure and theoretically demonstrate the existence of robust and low-dimensional subspace of deep neural networks. Our extensive experiments using the proposed power-law spectral method demonstrate that the power-law Hessian spectra critically relate to multiple important behaviors of deep learning, including optimization, generalization, and overparameterization. Notably, we discover that the power-law Hessian structure of a given LLM can effectively predict generalization during training, while conventional sharpness-based generalization measures that often works well on CNNs become nearly useless for as a generalization predictor of LLMs.

Original languageEnglish
Title of host publicationProceedings of the 42nd International Conference on Machine Learning, ICML 2025
PublisherML Research Press
Pages58805-58831
Number of pages27
Publication statusPublished - Jul 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver Convention Center, Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025
https://icml.cc/Conferences/2025 (Conference Website)
https://icml.cc/virtual/2025/calendar (Conference Calendar)
https://proceedings.mlr.press/v267/ (Conference Proceedings)

Publication series

NameProceedings of Machine Learning Research
PublisherML Research Press
Volume267

Conference

Conference42nd International Conference on Machine Learning, ICML 2025
Country/TerritoryCanada
CityVancouver
Period13/07/2519/07/25
Internet address

Fingerprint

Dive into the research topics of 'Investigating the Overlooked Hessian Structure: From CNNs to LLMs'. Together they form a unique fingerprint.

Cite this