RML++: Regroup Median Loss for Combating Label Noise

Fengpeng Li, Kemou Li, Qizhou Wang, Bo Han, Jinyu Tian, Jiantao Zhou*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Training deep neural networks (DNNs) typically necessitates large-scale, high-quality annotated datasets. However, due to the inherent challenges of precisely annotating vast numbers of training samples, label noise—characterized by potentially erroneous annotations—is common yet detrimental in practice. Currently, to combat the negative impacts of label noise, mainstream studies follow a pipeline that begins with data sampling and is followed by loss correction. Data sampling aims to partition the original training dataset into clean and noisy subsets, but it often suffers from biased sampling that can mislead models. Additionally, loss correction typically requires knowledge of the noise rate as a priori information, of which the precise estimation can be challenging. To this end, we propose a novel method, Regroup Median Loss Plus Plus (RML++), that addresses both of the previous drawbacks. Specifically, the training dataset is partitioned into clean and noisy subsets using a newly designed separation approach, which synergistically combines prediction consistency with an adaptive threshold to ensure a reliable sampling. Moreover, to ensure the noisy subsets can be robustly learned by models, we suggest to estimate the losses of noisy training samples by utilizing the same-class samples from the clean subset. Subsequently, the proposed method corrects the labels of noisy samples based on the model predictions with the regularization of RML++. Compared to state-of-the-art (SOTA) methods, RML++ achieves significant improvements on both synthetic and challenging real-world datasets. The source code is available at https://github.com/Feng-peng-Li/RML-Extension.

Original languageEnglish
Number of pages22
JournalInternational Journal of Computer Vision
DOIs
Publication statusE-pub ahead of print - 9 Jun 2025

User-Defined Keywords

  • Label noise
  • Robust estimation
  • Weakly supervised learning

Fingerprint

Dive into the research topics of 'RML++: Regroup Median Loss for Combating Label Noise'. Together they form a unique fingerprint.

Cite this