TY - JOUR
T1 - RML++: Regroup Median Loss for Combating Label Noise
AU - Li, Fengpeng
AU - Li, Kemou
AU - Wang, Qizhou
AU - Han, Bo
AU - Tian, Jinyu
AU - Zhou, Jiantao
N1 - This work was supported in part by Macau Science and Technology Development Fund under SKLIOTSC-2021-2023, 0022/2022/A1, and 0119/2024/RIB2; in part by Research Committee at University of Macau under MYRG-GRG2023-00058-FST-UMDF; in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515012536.
Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
PY - 2025/6/9
Y1 - 2025/6/9
N2 - Training deep neural networks (DNNs) typically necessitates large-scale, high-quality annotated datasets. However, due to the inherent challenges of precisely annotating vast numbers of training samples, label noise—characterized by potentially erroneous annotations—is common yet detrimental in practice. Currently, to combat the negative impacts of label noise, mainstream studies follow a pipeline that begins with data sampling and is followed by loss correction. Data sampling aims to partition the original training dataset into clean and noisy subsets, but it often suffers from biased sampling that can mislead models. Additionally, loss correction typically requires knowledge of the noise rate as a priori information, of which the precise estimation can be challenging. To this end, we propose a novel method, Regroup Median Loss Plus Plus (RML++), that addresses both of the previous drawbacks. Specifically, the training dataset is partitioned into clean and noisy subsets using a newly designed separation approach, which synergistically combines prediction consistency with an adaptive threshold to ensure a reliable sampling. Moreover, to ensure the noisy subsets can be robustly learned by models, we suggest to estimate the losses of noisy training samples by utilizing the same-class samples from the clean subset. Subsequently, the proposed method corrects the labels of noisy samples based on the model predictions with the regularization of RML++. Compared to state-of-the-art (SOTA) methods, RML++ achieves significant improvements on both synthetic and challenging real-world datasets. The source code is available at https://github.com/Feng-peng-Li/RML-Extension.
AB - Training deep neural networks (DNNs) typically necessitates large-scale, high-quality annotated datasets. However, due to the inherent challenges of precisely annotating vast numbers of training samples, label noise—characterized by potentially erroneous annotations—is common yet detrimental in practice. Currently, to combat the negative impacts of label noise, mainstream studies follow a pipeline that begins with data sampling and is followed by loss correction. Data sampling aims to partition the original training dataset into clean and noisy subsets, but it often suffers from biased sampling that can mislead models. Additionally, loss correction typically requires knowledge of the noise rate as a priori information, of which the precise estimation can be challenging. To this end, we propose a novel method, Regroup Median Loss Plus Plus (RML++), that addresses both of the previous drawbacks. Specifically, the training dataset is partitioned into clean and noisy subsets using a newly designed separation approach, which synergistically combines prediction consistency with an adaptive threshold to ensure a reliable sampling. Moreover, to ensure the noisy subsets can be robustly learned by models, we suggest to estimate the losses of noisy training samples by utilizing the same-class samples from the clean subset. Subsequently, the proposed method corrects the labels of noisy samples based on the model predictions with the regularization of RML++. Compared to state-of-the-art (SOTA) methods, RML++ achieves significant improvements on both synthetic and challenging real-world datasets. The source code is available at https://github.com/Feng-peng-Li/RML-Extension.
KW - Label noise
KW - Robust estimation
KW - Weakly supervised learning
UR - http://www.scopus.com/inward/record.url?scp=105007647777&partnerID=8YFLogxK
U2 - 10.1007/s11263-025-02494-4
DO - 10.1007/s11263-025-02494-4
M3 - Journal article
AN - SCOPUS:105007647777
SN - 0920-5691
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
ER -