TY - JOUR
T1 - Generalization analysis of deep CNNs under maximum correntropy criterion
AU - Zhang, Yingqiao
AU - Fang, Zhiying
AU - Fan, Jun
N1 - The work by Zhiying Fang is supported by the Research Foundation of Shenzhen Polytechnic University [Project No. 6023312031K]. The work by Jun Fan is partially supported by the Research Grants Council of Hong Kong [Project No. HKBU12302819] and [Project No. HKBU12303220], and Hong Kong Baptist University [Project No. RC-FNRA-IG/22-23/SCI/02 ].
The work by Zhiying Fang is supported by the Post-doctoral Later-stage Foundation Project of Shenzhen Polytechnic University [Project No. 6023271019K]. The work by Jun Fan is partially supported by the Research Grants Council of Hong Kong [Project No. HKBU12302819] and [Project No. HKBU12303220], and Hong Kong Baptist University [Project No. RC-FNRA-IG/22-23/SCI/02].
Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/6
Y1 - 2024/6
N2 - Convolutional neural networks (CNNs) have gained immense popularity in recent years, finding their utility in diverse fields such as image recognition, natural language processing, and bio-informatics. Despite the remarkable progress made in deep learning theory, most studies on CNNs, especially in regression tasks, tend to heavily rely on the least squares loss function. However, there are situations where such learning algorithms may not suffice, particularly in the presence of heavy-tailed noises or outliers. This predicament emphasizes the necessity of exploring alternative loss functions that can handle such scenarios more effectively, thereby unleashing the true potential of CNNs. In this paper, we investigate the generalization error of deep CNNs with the rectified linear unit (ReLU) activation function for robust regression problems within an information-theoretic learning framework. Our study demonstrates that when the regression function exhibits an additive ridge structure and the noise possesses a finite pth moment, the empirical risk minimization scheme, generated by the maximum correntropy criterion and deep CNNs, achieves fast convergence rates. Notably, these rates align with the mini-max optimal convergence rates attained by fully connected neural network model with the Huber loss function up to a logarithmic factor. Additionally, we further establish the convergence rates of deep CNNs under the maximum correntropy criterion when the regression function resides in a Sobolev space on the sphere.
AB - Convolutional neural networks (CNNs) have gained immense popularity in recent years, finding their utility in diverse fields such as image recognition, natural language processing, and bio-informatics. Despite the remarkable progress made in deep learning theory, most studies on CNNs, especially in regression tasks, tend to heavily rely on the least squares loss function. However, there are situations where such learning algorithms may not suffice, particularly in the presence of heavy-tailed noises or outliers. This predicament emphasizes the necessity of exploring alternative loss functions that can handle such scenarios more effectively, thereby unleashing the true potential of CNNs. In this paper, we investigate the generalization error of deep CNNs with the rectified linear unit (ReLU) activation function for robust regression problems within an information-theoretic learning framework. Our study demonstrates that when the regression function exhibits an additive ridge structure and the noise possesses a finite pth moment, the empirical risk minimization scheme, generated by the maximum correntropy criterion and deep CNNs, achieves fast convergence rates. Notably, these rates align with the mini-max optimal convergence rates attained by fully connected neural network model with the Huber loss function up to a logarithmic factor. Additionally, we further establish the convergence rates of deep CNNs under the maximum correntropy criterion when the regression function resides in a Sobolev space on the sphere.
KW - Convergence rates
KW - Convolutional neural networks
KW - Heavy-tailed noises
KW - Information-theoretic learning
KW - Learning theory
KW - Maximum correntropy criterion
UR - http://www.scopus.com/inward/record.url?scp=85187803186&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2024.106226
DO - 10.1016/j.neunet.2024.106226
M3 - Journal article
C2 - 38490117
AN - SCOPUS:85187803186
SN - 0893-6080
VL - 174
JO - Neural Networks
JF - Neural Networks
M1 - 106226
ER -