TY - GEN
T1 - Attacks which do not kill training make adversarial learning stronger
AU - Zhang, Jingfeng
AU - Xu, Xilie
AU - Han, Bo
AU - Niu, Gang
AU - Cui, Lizhen
AU - Sugiyama, Masashi
AU - Kankanhalli, Mohan
N1 - Funding Information:
This research is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative (MK, JZ), JST AIP Acceleration Research Grant Number JPMJCR20U3, Japan (GN, MS), National Key R&D Program No.2017YFB1400100, the NSFC No.91846205, the Shandong Key R&D Program No.2018YFJH0506 (LC), the Early Career Scheme (ECS) through the Research Grants Council of Hong Kong under Grant No.22200720 (BH), HKBU Tier-1 Start-up Grant (BH) and HKBU CSD Start-up Grant (BH). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.
PY - 2020/7
Y1 - 2020/7
N2 - Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel formulation of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial data (i.e., friendly adversarial data) minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively adversarial robustness can indeed be achieved without compromising the natural generalization.
AB - Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel formulation of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial data (i.e., friendly adversarial data) minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively adversarial robustness can indeed be achieved without compromising the natural generalization.
UR - http://www.scopus.com/inward/record.url?scp=85105322490&partnerID=8YFLogxK
UR - https://proceedings.mlr.press/v119/zhang20z.html
M3 - Conference proceeding
AN - SCOPUS:85105322490
T3 - Proceedings of Machine Learning Research
SP - 11214
EP - 11224
BT - Proceedings of the 37th International Conference on Machine Learning, ICML 2020
A2 - Daumé III, Hal
A2 - Singh, Aarti
PB - ML Research Press
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -