TY - JOUR
T1 - ExplainHM++
T2 - Explainable Harmful Meme Detection With Retrieval-Augmented Debate Between Large Multimodal Models
AU - Lin, Hongzhan
AU - Gao, Wei
AU - Ma, Jing
AU - Deng, Yang
AU - Luo, Ziyang
AU - Wang, Bo
AU - Yang, Ruichao
AU - Chua, Tat Seng
N1 - Publisher Copyright:
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies.
PY - 2025/11/26
Y1 - 2025/11/26
N2 - Identifying harmful memes is challenging due to their implicit meanings, which are not always evident from texts and images alone. Existing solutions often lack clear explanations to justify their decisions. To address this gap, we propose an explainable approach, ExplainHM++, which detects harmful memes by reasoning over competing rationales from both harmful and harmless perspectives. First, inspired by the capabilities of Large Multimodal Models (LMMs) in text generation and multimodal reasoning, we develop ExplainHM, a one-stage multimodal debate in which LMMs generate explanations through contradictory arguments. Second, we fine-tune a small language model to serve as a judge in the debate, improving the integration of harmfulness rationales with the multimodal content of memes. However, we observe that a naive multimodal debate remains vulnerable, as it heavily depends on the inherent reasoning ability of LMMs to understand the memes. Given the evolving and noisy nature of memes, we further introduce a meme sample retrieval mechanism and a retrieval-augmented debate paradigm to strengthen and refine LMM-generated explanations. Extensive experiments on three public meme datasets demonstrate that ExplainHM++ not only outperforms state-of-the-art methods but also provides superior, interpretable explanations for harmful meme detection.
AB - Identifying harmful memes is challenging due to their implicit meanings, which are not always evident from texts and images alone. Existing solutions often lack clear explanations to justify their decisions. To address this gap, we propose an explainable approach, ExplainHM++, which detects harmful memes by reasoning over competing rationales from both harmful and harmless perspectives. First, inspired by the capabilities of Large Multimodal Models (LMMs) in text generation and multimodal reasoning, we develop ExplainHM, a one-stage multimodal debate in which LMMs generate explanations through contradictory arguments. Second, we fine-tune a small language model to serve as a judge in the debate, improving the integration of harmfulness rationales with the multimodal content of memes. However, we observe that a naive multimodal debate remains vulnerable, as it heavily depends on the inherent reasoning ability of LMMs to understand the memes. Given the evolving and noisy nature of memes, we further introduce a meme sample retrieval mechanism and a retrieval-augmented debate paradigm to strengthen and refine LMM-generated explanations. Extensive experiments on three public meme datasets demonstrate that ExplainHM++ not only outperforms state-of-the-art methods but also provides superior, interpretable explanations for harmful meme detection.
KW - Harmful meme detection
KW - explainability
KW - retrieval-augmented debate
KW - large multimodal models
UR - https://www.scopus.com/pages/publications/105023200697
U2 - 10.1109/TKDE.2025.3637552
DO - 10.1109/TKDE.2025.3637552
M3 - Journal article
AN - SCOPUS:105023200697
SN - 1041-4347
VL - 38
SP - 1
EP - 14
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 2
ER -