Estimating Per-Class Statistics for Label Noise Learning

Wenshui Luo, Shuo Chen*, Tongliang Liu, Bo Han, Gang Niu, Masashi Sugiyama, Dacheng Tao, Chen Gong*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Real-world data may contain a considerable amount of noisily labeled examples, which usually mislead the training algorithm and result in degraded classification performance on test data. Therefore, Label Noise Learning (LNL) was proposed, of which one popular research trend focused on estimating the critical statistics (e.g., sample mean and sample covariance), to recover the clean data distribution. However, existing methods may suffer from the unreliable sample selection process or can hardly be applied to multi-class cases. Inspired by the centroid estimation theory, we propose Per-Class Statistic Estimation (PCSE), which establishes the quantitative relationship between the clean (first-order and second-order) statistics and the corresponding noisy statistics for every class. This relationship is further utilized to induce a generative classifier for model inference. Unlike existing methods, our approach does not require sample selection from the instance level. Moreover, our PCSE can serve as a general post-processing strategy applicable to various popular networks pre-trained on the noisy dataset for boosting their classification performance. Theoretically, we prove that the estimated statistics converge to their ground-truth values as the sample size increases, even if the label transition matrix is biased. Empirically, we conducted intensive experiments on various binary and multi-class datasets, and the results demonstrate that PCSE achieves more precise statistic estimation as well as higher classification accuracy when compared with state-of-the-art methods in LNL.
Original languageEnglish
Article number10689264
Number of pages17
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
DOIs
Publication statusE-pub ahead of print - 23 Sept 2024

Scopus Subject Areas

  • Software
  • Artificial Intelligence
  • Applied Mathematics
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics

User-Defined Keywords

  • Label noise
  • Statistic estimation
  • Unbiasedness

Cite this