TY - JOUR
T1 - Data-Dependent Generalization Bounds for Multi-Class Classification
AU - Lei, Yunwen
AU - Dogan, Ürün
AU - Zhou, Ding Xuan
AU - Kloft, Marius
N1 - Funding Information:
Y. Lei was supported in part by the National Key Research and Development Program of China under Grant 2017YFC0804003, in part by the National Natural Science Foundation of China under Grant 61806091, in part by the Shenzhen Peacock Plan under Grant KQTD2016112514355531, in part by the Science and Technology Innovation Committee Foundation of Shenzhen under Grant ZDSYS201703031748284, and in part by the Alexander von Humboldt Foundation for a Humboldt Research Fellowship. D.-X. Zhou was supported in part by the NSFC/RGC Joint Research Scheme through RGC under Project N_C CityU120/14 and in part by NSFC under Project 11461161006. M. Kloft was supported in part by the German Research Foundation under Award KL 2698/2-1 and Award GRK1589/2 and in part by the Federal Ministry of Science and Education (BMBF) under Award 031L0023A and Award 01IS18051A. This paper was presented in part at the 2015 Advances in Neural Information Processing Systems.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - In this paper, we study data-dependent generalization error bounds that exhibit a mild dependency on the number of classes, making them suitable for multi-class learning with a large number of label classes. The bounds generally hold for empirical multi-class risk minimization algorithms using an arbitrary norm as the regularizer. Key to our analysis is new structural results for multi-class Gaussian complexities and empirical ℓ∞-norm covering numbers, which exploit the Lipschitz continuity of the loss function with respect to the ℓ2- and ℓ∞-norm, respectively. We establish data-dependent error bounds in terms of the complexities of a linear function class defined on a finite set induced by training examples, for which we show tight lower and upper bounds. We apply the results to several prominent multi-class learning machines and show a tighter dependency on the number of classes than the state of the art. For instance, for the multi-class support vector machine of Crammer and Singer (2002), we obtain a data-dependent bound with a logarithmic dependency, which is a significant improvement of the previous square-root dependency. The experimental results are reported to verify the effectiveness of our theoretical findings.
AB - In this paper, we study data-dependent generalization error bounds that exhibit a mild dependency on the number of classes, making them suitable for multi-class learning with a large number of label classes. The bounds generally hold for empirical multi-class risk minimization algorithms using an arbitrary norm as the regularizer. Key to our analysis is new structural results for multi-class Gaussian complexities and empirical ℓ∞-norm covering numbers, which exploit the Lipschitz continuity of the loss function with respect to the ℓ2- and ℓ∞-norm, respectively. We establish data-dependent error bounds in terms of the complexities of a linear function class defined on a finite set induced by training examples, for which we show tight lower and upper bounds. We apply the results to several prominent multi-class learning machines and show a tighter dependency on the number of classes than the state of the art. For instance, for the multi-class support vector machine of Crammer and Singer (2002), we obtain a data-dependent bound with a logarithmic dependency, which is a significant improvement of the previous square-root dependency. The experimental results are reported to verify the effectiveness of our theoretical findings.
KW - covering numbers
KW - Gaussian complexities
KW - generalization error bounds
KW - Multi-class classification
KW - Rademacher complexities
UR - http://www.scopus.com/inward/record.url?scp=85065143828&partnerID=8YFLogxK
U2 - 10.1109/TIT.2019.2893916
DO - 10.1109/TIT.2019.2893916
M3 - Journal article
AN - SCOPUS:85065143828
SN - 0018-9448
VL - 65
SP - 2995
EP - 3021
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 5
ER -