Towards Defending against Adversarial Examples via Attack-Invariant Features

Dawei Zhou, Tongliang Liu, Bo Han, Nannan Wang*, Chunlei Peng, Xinbo Gao

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

17 Citations (Scopus)


Deep neural networks (DNNs) are vulnerable to adversarial noise. Their adversarial robustness can be improved by exploiting adversarial examples. However, given the continuously evolving attacks, models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples. To solve this problem, in this paper, we propose to remove adversarial noise by learning generalizable invariant features across attacks which maintain semantic classification information. Specifically, we introduce an adversarial feature learning mechanism to disentangle invariant features from adversarial noise. A normalization term has been proposed in the encoded space of the attack-invariant features to address the bias issue between the seen and unseen types of attacks. Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-the-art approaches, especially against unseen types of attacks and adaptive attacks.
Original languageEnglish
Title of host publicationProceedings of 38th International Conference on Machine Learning (ICML 2021)
EditorsMarina Meila, Tong Zhang
PublisherML Research Press
Number of pages11
Publication statusPublished - 18 Jul 2021
Event38th International Conference on Machine Learning, ICML 2021 - Virtual
Duration: 18 Jul 202124 Jul 2021

Publication series

NameProceedings of Machine Learning Research
ISSN (Print)2640-3498


Conference38th International Conference on Machine Learning, ICML 2021
Internet address


Dive into the research topics of 'Towards Defending against Adversarial Examples via Attack-Invariant Features'. Together they form a unique fingerprint.

Cite this