Stronger Separability, Stronger Defense: Influence-Based Backdoor Detection

  • Buhua Liu
  • , Shuo Yang
  • , Zhiqiang Xu
  • , Haoyi Xiong
  • , Yiu Ming Cheung
  • , Zeke Xie*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where an attacker can insert hidden functionality into DNNs by simply manipulating a small amount of training data, without compromising the victim DNN’s normal functionality. To defend against such attacks, one line of work focuses on detecting suspicious samples before training according to the latent separability assumption that clean and poison samples can be separated in representation space learned by a trained DNN. However, recent strong backdoor attacks can easily break the representation separability, thus existing detection methods become invalid. To this end, we propose to detect poison samples in influence space by tracing data influence on model parameters instead of conventional model outputs. We show that influence separability is significantly stronger than conventional representation separability in terms of four common statistics (e.g., Silhouette Score increases by 122% on average). With such strong separability in influence space, we can easily obtain stronger backdoor detection and defense by employing existing methods or even simple statistics in influence space. Extensive experiments show that our influence-based methods can significantly outperform conventional representation-based baselines against eight representative backdoor attacks. Particularly, influence space can surprisingly reduce the average attack success rate by 43.4 points (47.2%→3.8%) over three benchmark datasets than representation space.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
Subtitle of host publication29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, NSW, Australia, June 10–13, 2025, Proceedings, Part I
EditorsXintao Wu, Myra Spiliopoulou, Can Wang, Vipin Kumar, Longbing Cao, Yanqiu Wu, Zhangkai Wu, Yu Yao
Place of PublicationSingapore
PublisherSpringer
Pages108-120
Number of pages13
Edition1
ISBN (Electronic)9789819681709
ISBN (Print)9789819681693
DOIs
Publication statusPublished - 14 Jun 2025
Event29th Pacific-Asia Conference on Knowledge Discovery and Data Mining - Sydney Masonic Centre, Sydney, Australia
Duration: 10 Jun 202513 Jun 2025
https://pakdd2025.org/ (Conference website)
https://link.springer.com/book/10.1007/978-981-96-8170-9#overview (Conference proceeding)

Publication series

NameLecture Notes in Computer Science
Volume15870
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349
NameLecture Notes in Artificial Intelligence
ISSN (Print)2945-9133
ISSN (Electronic)2945-9141
NamePAKDD: Pacific-Asia Conference on Knowledge Discovery and Data Mining
PublisherSpringer

Conference

Conference29th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Abbreviated titlePAKDD 2025
Country/TerritoryAustralia
CitySydney
Period10/06/2513/06/25
Internet address

User-Defined Keywords

  • Backdoor attack
  • Backdoor defense
  • Influence function

Fingerprint

Dive into the research topics of 'Stronger Separability, Stronger Defense: Influence-Based Backdoor Detection'. Together they form a unique fingerprint.

Cite this