TY - JOUR
T1 - Learning Rates for Stochastic Gradient Descent with Nonconvex Objectives
AU - Lei, Yunwen
AU - Tang, Ke
N1 - Funding information:
The authors are grateful to the editor and referees for the constructive comments, which are very helpful to improve the paper. This work was supported in part by the Guangdong Provincial Key Laboratory under Grant 2020B121201001, the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant 2017ZT07X386, the National Natural Science Foundation of China under Grant 61806091, the Science and Technology Commission of Shanghai Municipality under Grant 19511120600, and the MOE University Scientific-Technological Innovation Plan Program.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Stochastic gradient descent (SGD) has become the method of choice for training highly complex and nonconvex models since it can not only recover good solutions to minimize training errors but also generalize well. Computational and statistical properties are separately studied to understand the behavior of SGD in the literature. However, there is a lacking study to jointly consider the computational and statistical properties in a nonconvex learning setting. In this paper, we develop novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. We show that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, which sheds insights on how an implicit regularization can be achieved by tuning the number of passes to balance the computational and statistical errors. As a byproduct, we also slightly refine the existing studies on the uniform convergence of gradients by showing its connection to Rademacher chaos complexities.
AB - Stochastic gradient descent (SGD) has become the method of choice for training highly complex and nonconvex models since it can not only recover good solutions to minimize training errors but also generalize well. Computational and statistical properties are separately studied to understand the behavior of SGD in the literature. However, there is a lacking study to jointly consider the computational and statistical properties in a nonconvex learning setting. In this paper, we develop novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. We show that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, which sheds insights on how an implicit regularization can be achieved by tuning the number of passes to balance the computational and statistical errors. As a byproduct, we also slightly refine the existing studies on the uniform convergence of gradients by showing its connection to Rademacher chaos complexities.
KW - early stopping
KW - learning rates
KW - nonconvex optimization
KW - Stochastic gradient descent
UR - http://www.scopus.com/inward/record.url?scp=85103253583&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2021.3068154
DO - 10.1109/TPAMI.2021.3068154
M3 - Journal article
C2 - 33755555
AN - SCOPUS:85103253583
SN - 0162-8828
VL - 43
SP - 4505
EP - 4511
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 12
ER -