TY - GEN
T1 - Reliable Adversarial Distillation with Unreliable Teachers
AU - Zhu, Jianing
AU - Yao, Jiangchao
AU - Han, Bo
AU - Zhang, Jingfeng
AU - Liu, Tongliang
AU - Niu, Gang
AU - Zhou, Jingren
AU - XU, Jianliang
AU - Yang, Hongxia
N1 - Funding Information:
JNZ and BH were supported by the RGC Early Career Scheme No. 22200720, NSFC Young Scientists Fund No. 62006202 and HKBU CSD Departmental Incentive Grant. JCY and HXY was supported by NSFC No. U20A20222. JFZ was supported by JST, ACT-X Grant Number JPMJAX21AF. TLL was supported by Australian Research Council Projects DE-190101473 and DP-220102121. JLX was supported by RGC Grant C6030-18GF.
Publisher Copyright:
© 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved.
PY - 2022/4/25
Y1 - 2022/4/25
N2 - In ordinary distillation, student networks are trained with soft labels (SLs) given by pretrained teacher networks, and students are expected to improve upon teachers since SLs are stronger supervision than the original hard labels. However, when considering adversarial robustness, teachers may become unreliable and adversarial distillation may not work: teachers are pretrained on their own adversarial data, and it is too demanding to require that teachers are also good at every adversarial data queried by students. Therefore, in this paper, we propose reliable introspective adversarial distillation (IAD) where students partially instead of fully trust their teachers. Specifically, IAD distinguishes between three cases given a query of a natural data (ND) and the corresponding adversarial data (AD): (a) if a teacher is good at AD, its SL is fully trusted; (b) if a teacher is good at ND but not AD, its SL is partially trusted and the student also takes its own SL into account; (c) otherwise, the student only relies on its own SL. Experiments demonstrate the effectiveness of IAD for improving upon teachers in terms of adversarial robustness.
AB - In ordinary distillation, student networks are trained with soft labels (SLs) given by pretrained teacher networks, and students are expected to improve upon teachers since SLs are stronger supervision than the original hard labels. However, when considering adversarial robustness, teachers may become unreliable and adversarial distillation may not work: teachers are pretrained on their own adversarial data, and it is too demanding to require that teachers are also good at every adversarial data queried by students. Therefore, in this paper, we propose reliable introspective adversarial distillation (IAD) where students partially instead of fully trust their teachers. Specifically, IAD distinguishes between three cases given a query of a natural data (ND) and the corresponding adversarial data (AD): (a) if a teacher is good at AD, its SL is fully trusted; (b) if a teacher is good at ND but not AD, its SL is partially trusted and the student also takes its own SL into account; (c) otherwise, the student only relies on its own SL. Experiments demonstrate the effectiveness of IAD for improving upon teachers in terms of adversarial robustness.
UR - https://openreview.net/forum?id=u6TRGdzhfip
UR - http://www.scopus.com/inward/record.url?scp=85142909789&partnerID=8YFLogxK
U2 - 10.48550/arXiv.2106.04928
DO - 10.48550/arXiv.2106.04928
M3 - Conference proceeding
BT - Proceedings of Tenth International Conference on Learning Representations, ICLR 2022
PB - International Conference on Learning Representations
T2 - 10th International Conference on Learning Representations, ICLR 2022
Y2 - 25 April 2022 through 29 April 2022
ER -