TY - JOUR
T1 - Improving Adversarial Robustness via Mutual Information Estimation
AU - Zhou, Dawei
AU - Wang, Nannan
AU - Gao, Xinbo
AU - Han, Bo
AU - Wang, Xiaoyu
AU - Zhan, Yibing
AU - Liu, Tongliang
N1 - Funding Information:
This work was supported in part by the National Key Research and Development Program of China under Grant 2018AAA0103202, in part by the National Natural Science Foundation of China under Grant 61922066, 61876142, 62036007, 62006202, 61922066, 61876142, 62036007, and 62002090, in part by the Technology Innovation Leading Program of Shaanxi under Grant 2022QFY01-15, in part by Open Research Projects of Zhejiang Lab under Grant 2021KG0AB01, in part by the RGC Early Career Scheme No. 22200720, in part by Guangdong Basic and Applied Basic Research Foundation No. 2022A1515011652, in part by Australian Research Council Projects DE-190101473, IC-190100031, and DP-220102121, in part by the Fundamental Research Funds for the Central Universities, and in part by the Innovation Fund of Xidian University. The authors thank the reviewers and the meta-reviewer for their helpful and constructive comments on this work.
Publisher Copyright:
Copyright © 2022 by the author(s)
PY - 2022/7/17
Y1 - 2022/7/17
N2 - Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w.r.t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.
AB - Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w.r.t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.
UR - http://www.scopus.com/inward/record.url?scp=85148276880&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85148276880
SN - 2640-3498
VL - 162
SP - 27338
EP - 27352
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 39th International Conference on Machine Learning, ICML 2022
Y2 - 17 July 2022 through 23 July 2022
ER -