Abstract
Given data with noisy labels, over-parameterized deep networks suffer
overfitting mislabeled data, resulting in poor generalization. The
memorization effect of deep networks shows that although the networks
have the ability to memorize all noisy data, they would first memorize
clean training data, and then gradually memorize mislabeled training
data. A simple and effective method that exploits the memorization
effect to combat noisy labels is early stopping. However, early stopping
cannot distinguish the memorization of clean data and mislabeled data,
resulting in the network still inevitably overfitting mislabeled data in
the early training stage. In this paper, to decouple the memorization
of clean data and mislabeled data, and further reduce the side effect of
mislabeled data, we perform additive decomposition on network
parameters. Namely, all parameters are additively decomposed into two
groups, i.e., parameters
w
are decomposed as
w=σ+γ
. Afterward, the parameters
σ
are considered to memorize clean data, while the parameters
γ
are considered to memorize mislabeled data. Benefiting from the memorization effect, the updates of the parameters
σ
are encouraged to fully memorize clean data in early training, and then
discouraged with the increase of training epochs to reduce interference
of mislabeled data. The updates of the parameters
γ
are the opposite. In testing, only the parameters
σ
are employed to enhance generalization. Extensive experiments on both
simulated and real-world benchmarks confirm the superior performance of
our method.
Original language | English |
---|---|
Pages (from-to) | 6341-6354 |
Number of pages | 14 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Volume | 46 |
Issue number | 9 |
Early online date | 28 Mar 2024 |
DOIs | |
Publication status | Published - Sept 2024 |
Scopus Subject Areas
- Software
- Artificial Intelligence
- Applied Mathematics
- Computer Vision and Pattern Recognition
- Computational Theory and Mathematics
User-Defined Keywords
- early stopping
- learning with noisy labels
- memorization effect
- parameter decomposition
- Early stopping