TY - JOUR
T1 - Estimating Per-Class Statistics for Label Noise Learning
AU - Luo, Wenshui
AU - Chen, Shuo
AU - Liu, Tongliang
AU - Han, Bo
AU - Niu, Gang
AU - Sugiyama, Masashi
AU - Tao, Dacheng
AU - Gong, Chen
N1 - This work was supported in part by the NSF of China under Grant 62336003, Grant 12371510, and Grant 62376235, in part by the NSF of Jiangsu Province under Grant BZ2021013, in part by the NSF for Distinguished Young Scholar of Jiangsu Province under Grant BK20220080, in part by 111 Program under Grant B13022, in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2022A1515011652 and Grant 2024A1515012399, in part by HKBU Faculty Niche Research Areas under Grant RC-FNRA-IG/22-23/SCI/04, and in part by HKBU CSD Departmental Incentive Grant.
Publisher Copyright:
© 2024 IEEE
PY - 2025/1
Y1 - 2025/1
N2 - Real-world data may contain a considerable amount of noisily labeled examples, which usually mislead the training algorithm and result in degraded classification performance on test data. Therefore, Label Noise Learning (LNL) was proposed, of which one popular research trend focused on estimating the critical statistics (e.g., sample mean and sample covariance), to recover the clean data distribution. However, existing methods may suffer from the unreliable sample selection process or can hardly be applied to multi-class cases. Inspired by the centroid estimation theory, we propose Per-Class Statistic Estimation (PCSE), which establishes the quantitative relationship between the clean (first-order and second-order) statistics and the corresponding noisy statistics for every class. This relationship is further utilized to induce a generative classifier for model inference. Unlike existing methods, our approach does not require sample selection from the instance level. Moreover, our PCSE can serve as a general post-processing strategy applicable to various popular networks pre-trained on the noisy dataset for boosting their classification performance. Theoretically, we prove that the estimated statistics converge to their ground-truth values as the sample size increases, even if the label transition matrix is biased. Empirically, we conducted intensive experiments on various binary and multi-class datasets, and the results demonstrate that PCSE achieves more precise statistic estimation as well as higher classification accuracy when compared with state-of-the-art methods in LNL.
AB - Real-world data may contain a considerable amount of noisily labeled examples, which usually mislead the training algorithm and result in degraded classification performance on test data. Therefore, Label Noise Learning (LNL) was proposed, of which one popular research trend focused on estimating the critical statistics (e.g., sample mean and sample covariance), to recover the clean data distribution. However, existing methods may suffer from the unreliable sample selection process or can hardly be applied to multi-class cases. Inspired by the centroid estimation theory, we propose Per-Class Statistic Estimation (PCSE), which establishes the quantitative relationship between the clean (first-order and second-order) statistics and the corresponding noisy statistics for every class. This relationship is further utilized to induce a generative classifier for model inference. Unlike existing methods, our approach does not require sample selection from the instance level. Moreover, our PCSE can serve as a general post-processing strategy applicable to various popular networks pre-trained on the noisy dataset for boosting their classification performance. Theoretically, we prove that the estimated statistics converge to their ground-truth values as the sample size increases, even if the label transition matrix is biased. Empirically, we conducted intensive experiments on various binary and multi-class datasets, and the results demonstrate that PCSE achieves more precise statistic estimation as well as higher classification accuracy when compared with state-of-the-art methods in LNL.
KW - Label noise
KW - Statistic estimation
KW - Unbiasedness
UR - http://www.scopus.com/inward/record.url?scp=86000384044&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2024.3466182
DO - 10.1109/TPAMI.2024.3466182
M3 - Journal article
SN - 1939-3539
VL - 47
SP - 305
EP - 322
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 1
ER -