Abstract
Fine-grained Image-text retrieval is challenging but vital technology in the field of multimedia analysis. Existing methods mainly focus on learning the common embedding space of images (or patches) and sentences (or words), whereby their mapping features in such embedding space can be directly measured. Nevertheless, most existing image-text retrieval works rarely consider the shared semantic concepts that potentially correlated the heterogeneous modalities, which can enhance the discriminative power of learning such embedding space. Toward this end, we propose a Cross-Graph Attention model (CGAM) to explicitly learn the shared semantic concepts, which can be well utilized to guide the feature learning process of each modality and promote the common embedding learning. More specifically, we build semantic-embedded graph for each modality, and smooth the discrepancy between two modalities via cross-graph attention model to obtain shared semantic-enhanced features. Meanwhile, we reconstruct image and text features via the shared semantic concepts and original embedding representations, and leverage multi-head mechanism for similarity calculation. Accordingly, the semantic-enhanced cross-modal embedding between image and text is discriminatively obtained to benefit the fine-grained retrieval with high retrieval performance. Extensive experiments evaluated on benchmark datasets show the performance improvements in comparison with state-of-the-arts.
Original language | English |
---|---|
Title of host publication | SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval |
Publisher | Association for Computing Machinery (ACM) |
Pages | 1865-1869 |
Number of pages | 5 |
ISBN (Print) | 9781450380379 |
DOIs | |
Publication status | Published - 11 Jul 2021 |
Event | 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021 - Virtual, Montreal, Canada Duration: 11 Jul 2021 → 15 Jul 2021 https://sigir.org/sigir2021/ |
Conference
Conference | 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021 |
---|---|
Country/Territory | Canada |
City | Montreal |
Period | 11/07/21 → 15/07/21 |
Internet address |
User-Defined Keywords
- Image-text retrieval
- cross-graph attention
- shared cemantic concept
- multi-head mechanism