TY - JOUR
T1 - Tackling Noisy Labels with Network Parameter Additive Decomposition
AU - Wang, Jingyi
AU - Xia, Xiaobo
AU - Lan, Long
AU - Wu, Xinghao
AU - Yu, Jun
AU - Yang, Wenjing
AU - Han, Bo
AU - Liu, Tongliang
N1 - This work was supported by the National Natural Science Foundation of China (No. 62376282, No. 62372459).
Publisher Copyright:
© IEEE.
PY - 2024/9
Y1 - 2024/9
N2 - Given data with noisy labels, over-parameterized deep networks suffer
overfitting mislabeled data, resulting in poor generalization. The
memorization effect of deep networks shows that although the networks
have the ability to memorize all noisy data, they would first memorize
clean training data, and then gradually memorize mislabeled training
data. A simple and effective method that exploits the memorization
effect to combat noisy labels is early stopping. However, early stopping
cannot distinguish the memorization of clean data and mislabeled data,
resulting in the network still inevitably overfitting mislabeled data in
the early training stage. In this paper, to decouple the memorization
of clean data and mislabeled data, and further reduce the side effect of
mislabeled data, we perform additive decomposition on network
parameters. Namely, all parameters are additively decomposed into two
groups, i.e., parameters
w
are decomposed as
w=σ+γ
. Afterward, the parameters
σ
are considered to memorize clean data, while the parameters
γ
are considered to memorize mislabeled data. Benefiting from the memorization effect, the updates of the parameters
σ
are encouraged to fully memorize clean data in early training, and then
discouraged with the increase of training epochs to reduce interference
of mislabeled data. The updates of the parameters
γ
are the opposite. In testing, only the parameters
σ
are employed to enhance generalization. Extensive experiments on both
simulated and real-world benchmarks confirm the superior performance of
our method.
AB - Given data with noisy labels, over-parameterized deep networks suffer
overfitting mislabeled data, resulting in poor generalization. The
memorization effect of deep networks shows that although the networks
have the ability to memorize all noisy data, they would first memorize
clean training data, and then gradually memorize mislabeled training
data. A simple and effective method that exploits the memorization
effect to combat noisy labels is early stopping. However, early stopping
cannot distinguish the memorization of clean data and mislabeled data,
resulting in the network still inevitably overfitting mislabeled data in
the early training stage. In this paper, to decouple the memorization
of clean data and mislabeled data, and further reduce the side effect of
mislabeled data, we perform additive decomposition on network
parameters. Namely, all parameters are additively decomposed into two
groups, i.e., parameters
w
are decomposed as
w=σ+γ
. Afterward, the parameters
σ
are considered to memorize clean data, while the parameters
γ
are considered to memorize mislabeled data. Benefiting from the memorization effect, the updates of the parameters
σ
are encouraged to fully memorize clean data in early training, and then
discouraged with the increase of training epochs to reduce interference
of mislabeled data. The updates of the parameters
γ
are the opposite. In testing, only the parameters
σ
are employed to enhance generalization. Extensive experiments on both
simulated and real-world benchmarks confirm the superior performance of
our method.
KW - early stopping
KW - learning with noisy labels
KW - memorization effect
KW - parameter decomposition
KW - Early stopping
UR - http://www.scopus.com/inward/record.url?scp=85189324750&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2024.3382138
DO - 10.1109/TPAMI.2024.3382138
M3 - Journal article
SN - 0162-8828
VL - 46
SP - 6341
EP - 6354
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 9
ER -