Learning Rates for Stochastic Gradient Descent with Nonconvex Objectives

Yunwen Lei, Ke Tang*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

28 Citations (Scopus)

Abstract

Stochastic gradient descent (SGD) has become the method of choice for training highly complex and nonconvex models since it can not only recover good solutions to minimize training errors but also generalize well. Computational and statistical properties are separately studied to understand the behavior of SGD in the literature. However, there is a lacking study to jointly consider the computational and statistical properties in a nonconvex learning setting. In this paper, we develop novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. We show that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, which sheds insights on how an implicit regularization can be achieved by tuning the number of passes to balance the computational and statistical errors. As a byproduct, we also slightly refine the existing studies on the uniform convergence of gradients by showing its connection to Rademacher chaos complexities.

Original languageEnglish
Pages (from-to)4505-4511
Number of pages7
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume43
Issue number12
DOIs
Publication statusPublished - 1 Dec 2021

Scopus Subject Areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics

User-Defined Keywords

  • early stopping
  • learning rates
  • nonconvex optimization
  • Stochastic gradient descent

Fingerprint

Dive into the research topics of 'Learning Rates for Stochastic Gradient Descent with Nonconvex Objectives'. Together they form a unique fingerprint.

Cite this