TY - GEN
T1 - Meme Trojan
T2 - Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers
AU - Wang, Ruofei
AU - Lin, Hongzhan
AU - Luo, Ziyuan
AU - Cheung, Ka Chun
AU - See, Simon
AU - MA, Jing
AU - Wan, Renjie
PY - 2025/2/25
Y1 - 2025/2/25
N2 - Hateful meme detection aims to prevent the proliferation of hateful memes on various social media platforms. Considering its impact on social environments, this paper introduces a previously ignored but significant threat to hateful meme detection: backdoor attacks. By injecting specific triggers into meme samples, backdoor attackers can manipulate the detector to output their desired outcomes. To explore this, we propose the Meme Trojan framework to initiate backdoor attacks on hateful meme detection. Meme Trojan involves creating a novel Cross-Modal Trigger (CMT) and a learnable trigger augmentor to enhance the trigger pattern according to each input sample. Due to the cross-modal property, the proposed CMT can effectively initiate backdoor attacks on hateful meme detectors under an automatic application scenario. Additionally, the injection position and size of our triggers are adaptive to the texts contained in the meme, which ensures that the trigger is seamlessly integrated with the meme content. Our approach outperforms the state-of-the-art backdoor attack methods, showing significant improvements in effectiveness and stealthiness. We believe that this paper will draw more attention to the potential threat posed by backdoor attacks on hateful meme detection.
AB - Hateful meme detection aims to prevent the proliferation of hateful memes on various social media platforms. Considering its impact on social environments, this paper introduces a previously ignored but significant threat to hateful meme detection: backdoor attacks. By injecting specific triggers into meme samples, backdoor attackers can manipulate the detector to output their desired outcomes. To explore this, we propose the Meme Trojan framework to initiate backdoor attacks on hateful meme detection. Meme Trojan involves creating a novel Cross-Modal Trigger (CMT) and a learnable trigger augmentor to enhance the trigger pattern according to each input sample. Due to the cross-modal property, the proposed CMT can effectively initiate backdoor attacks on hateful meme detectors under an automatic application scenario. Additionally, the injection position and size of our triggers are adaptive to the texts contained in the meme, which ensures that the trigger is seamlessly integrated with the meme content. Our approach outperforms the state-of-the-art backdoor attack methods, showing significant improvements in effectiveness and stealthiness. We believe that this paper will draw more attention to the potential threat posed by backdoor attacks on hateful meme detection.
U2 - 10.1609/aaai.v39i8.32845
DO - 10.1609/aaai.v39i8.32845
M3 - Conference proceeding
SN - 157735897X
SN - 9781577358978
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 7844
EP - 7852
BT - Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence
PB - Association for the Advancement of Artificial Intelligence
ER -