Modality Fused Class-Proxy with Knowledge Distillation for Zero-Shot Sketch-based Image Retrieval

Chang-Xing Li, Donglin Zhang*, Zhikai Hu, Xiao-Jun Wu

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

In recent years, zero-shot sketch-based image retrieval (ZS-SBIR) task has attracted considerable attention. Although some ZS-SBIR approaches have been proposed, it remains challenging to handle the inherent linkages between the sketch and image domains. Moreover, how to transfer semantic knowledge from seen categories to unseen categories is still an open problem, significantly affecting retrieval performance. In this article, we propose a novel approach Modality Fused Class-Proxy with Knowledge Distillation, named MFCPKD, which develops two novel schemes to remedy the above issues. Specifically, MFCPKD leverages a Modality Fusion Model to learn modality-fused feature embeddings and class proxies. The knowledge distillation is employed for student to learn the feature from seen categories and infer the unknown category through class proxies. Furthermore, three losses constrain the student network to narrow the modality gap between sketch and image domains. Finally, we conduct extensive experiments on three benchmark datasets (Sketchy Ext, TU-Berlin Ext, and QuickDraw Ext) and demonstrate that our MFCPKD method can achieve excellent performance compared to some existing methods in ZS-SBIR scenarios.
Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalIEEE Transactions on Circuits and Systems for Video Technology
DOIs
Publication statusE-pub ahead of print - 15 Jan 2025

User-Defined Keywords

  • Sketch-based image retrieval
  • cross modality alignment
  • knowledge distillation
  • modality fusion
  • zero-shot learning

Fingerprint

Dive into the research topics of 'Modality Fused Class-Proxy with Knowledge Distillation for Zero-Shot Sketch-based Image Retrieval'. Together they form a unique fingerprint.

Cite this