Abstract
Real-world data may contain a considerable amount of noisily labeled examples, which usually mislead the training algorithm and result in degraded classification performance on test data. Therefore, Label Noise Learning (LNL) was proposed, of which one popular research trend focused on estimating the critical statistics (e.g., sample mean and sample covariance), to recover the clean data distribution. However, existing methods may suffer from the unreliable sample selection process or can hardly be applied to multi-class cases. Inspired by the centroid estimation theory, we propose Per-Class Statistic Estimation (PCSE), which establishes the quantitative relationship between the clean (first-order and second-order) statistics and the corresponding noisy statistics for every class. This relationship is further utilized to induce a generative classifier for model inference. Unlike existing methods, our approach does not require sample selection from the instance level. Moreover, our PCSE can serve as a general post-processing strategy applicable to various popular networks pre-trained on the noisy dataset for boosting their classification performance. Theoretically, we prove that the estimated statistics converge to their ground-truth values as the sample size increases, even if the label transition matrix is biased. Empirically, we conducted intensive experiments on various binary and multi-class datasets, and the results demonstrate that PCSE achieves more precise statistic estimation as well as higher classification accuracy when compared with state-of-the-art methods in LNL.
Original language | English |
---|---|
Article number | 10689264 |
Number of pages | 17 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
DOIs | |
Publication status | E-pub ahead of print - 23 Sept 2024 |
Scopus Subject Areas
- Software
- Artificial Intelligence
- Applied Mathematics
- Computer Vision and Pattern Recognition
- Computational Theory and Mathematics
User-Defined Keywords
- Label noise
- Statistic estimation
- Unbiasedness