TY - JOUR
T1 - Modality Fused Class-Proxy with Knowledge Distillation for Zero-Shot Sketch-based Image Retrieval
AU - Li, Chang-Xing
AU - Zhang, Donglin
AU - Hu, Zhikai
AU - Wu, Xiao-Jun
N1 - This work was supported by the National Natural Science Foundation of China (62202204), the Fundamental Research Funds for the Central Universities (JUSRP123032) and the National Key Research and Development Program of China (2023YFF1105102).
Publisher Copyright:
© 2025 IEEE.
PY - 2025/1/15
Y1 - 2025/1/15
N2 - In recent years, zero-shot sketch-based image retrieval (ZS-SBIR) task has attracted considerable attention. Although some ZS-SBIR approaches have been proposed, it remains challenging to handle the inherent linkages between the sketch and image domains. Moreover, how to transfer semantic knowledge from seen categories to unseen categories is still an open problem, significantly affecting retrieval performance. In this article, we propose a novel approach Modality Fused Class-Proxy with Knowledge Distillation, named MFCPKD, which develops two novel schemes to remedy the above issues. Specifically, MFCPKD leverages a Modality Fusion Model to learn modality-fused feature embeddings and class proxies. The knowledge distillation is employed for student to learn the feature from seen categories and infer the unknown category through class proxies. Furthermore, three losses constrain the student network to narrow the modality gap between sketch and image domains. Finally, we conduct extensive experiments on three benchmark datasets (Sketchy Ext, TU-Berlin Ext, and QuickDraw Ext) and demonstrate that our MFCPKD method can achieve excellent performance compared to some existing methods in ZS-SBIR scenarios.
AB - In recent years, zero-shot sketch-based image retrieval (ZS-SBIR) task has attracted considerable attention. Although some ZS-SBIR approaches have been proposed, it remains challenging to handle the inherent linkages between the sketch and image domains. Moreover, how to transfer semantic knowledge from seen categories to unseen categories is still an open problem, significantly affecting retrieval performance. In this article, we propose a novel approach Modality Fused Class-Proxy with Knowledge Distillation, named MFCPKD, which develops two novel schemes to remedy the above issues. Specifically, MFCPKD leverages a Modality Fusion Model to learn modality-fused feature embeddings and class proxies. The knowledge distillation is employed for student to learn the feature from seen categories and infer the unknown category through class proxies. Furthermore, three losses constrain the student network to narrow the modality gap between sketch and image domains. Finally, we conduct extensive experiments on three benchmark datasets (Sketchy Ext, TU-Berlin Ext, and QuickDraw Ext) and demonstrate that our MFCPKD method can achieve excellent performance compared to some existing methods in ZS-SBIR scenarios.
KW - Sketch-based image retrieval
KW - cross modality alignment
KW - knowledge distillation
KW - modality fusion
KW - zero-shot learning
UR - http://www.scopus.com/inward/record.url?scp=85215613770&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2025.3530248
DO - 10.1109/TCSVT.2025.3530248
M3 - Journal article
SN - 1558-2205
SP - 1
EP - 13
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -