TY - GEN
T1 - How does disagreement help generalization against label corruption?
AU - Yu, Xingrui
AU - Han, Bo
AU - Yao, Jiangchao
AU - Niu, Gang
AU - Tsang, Ivor W.
AU - Sugiyama, Masashi
N1 - MS was supported by JST CREST JPMJCR18A2. IWT was supported by ARC FT130100746, DP180100106 and LP150100671. XRY was supported by China Scholarship Council No. 201806450045. We gratefully acknowledge the support of NVIDIA Corporation with the donation of Titan Xp GPU used for this research.
Publisher Copyright:
Copyright © 2019 ASME
PY - 2019/6/9
Y1 - 2019/6/9
N2 - Learning with noisy labels is one of the hottest problems in weakly-supervised learning. Based on memorization effects of deep neural networks, training on small-loss instances becomes very promising for handling noisy labels. This fosters the state-of-the-art approach "Co-teaching" that cross-trains two deep neural networks using the small-loss trick. However, with the increase of epochs, two networks converge to a consensus and Co-teaching reduces to the self-training MentorNet. To tackle this issue, we propose a robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-teaching. First, two networks feed forward and predict all data, but keep prediction disagreement data only. Then, among such disagreement data, each network selects its small-loss data, but back propagates the small-loss data from its peer network and updates its own parameters. Empirical results on benchmark datasets demonstrate that Cotcaching+ is much superior to many statc-of-thcart methods in the robustness of trained models.
AB - Learning with noisy labels is one of the hottest problems in weakly-supervised learning. Based on memorization effects of deep neural networks, training on small-loss instances becomes very promising for handling noisy labels. This fosters the state-of-the-art approach "Co-teaching" that cross-trains two deep neural networks using the small-loss trick. However, with the increase of epochs, two networks converge to a consensus and Co-teaching reduces to the self-training MentorNet. To tackle this issue, we propose a robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-teaching. First, two networks feed forward and predict all data, but keep prediction disagreement data only. Then, among such disagreement data, each network selects its small-loss data, but back propagates the small-loss data from its peer network and updates its own parameters. Empirical results on benchmark datasets demonstrate that Cotcaching+ is much superior to many statc-of-thcart methods in the robustness of trained models.
UR - https://proceedings.mlr.press/v97/yu19b.html
UR - https://www.scopus.com/pages/publications/85078300446
M3 - Conference proceeding
AN - SCOPUS:85078300446
T3 - International Conference on Machine Learning, ICML
SP - 7164
EP - 7173
BT - Proceedings of the 36th International Conference on Machine Learning, ICML 2019
A2 - Chaudhuri, Kamalika
A2 - Salakhutdinov, Ruslan
PB - ML Research Press
T2 - 36th International Conference on Machine Learning, ICML 2019
Y2 - 9 June 2019 through 15 June 2019
ER -