Abstract
Cross-modal retrieval has received increasing attentions for efficient retrieval across different modalities, and hashing technique has made significant progress recently due to its low storage cost and high query speed. However, most existing cross-modal hashing works still face the challenges of narrowing down the semantic gap between different modalities and training with imbalanced multi-modal data. This article presents an efficient Adversarial Tri-Fusion Hashing Network (ATFH-N) for cross-modal retrieval, which lies among the early attempts to incorporate adversarial learning for working with imbalanced multi-modal data. Specifically, a triple fusion network associated with zero padding operation is proposed to adapt either balanced or imbalanced multi-modal training data. At the same time, an adversarial training mechanism is leveraged to maximally bridge the semantic gap of the common representations between balanced and imbalanced data. Further, a label prediction network is utilized to guide the feature learning process and promote hash code learning, while additionally embedding the manifold structure to preserve both inter-modal and intra-modal similarities. Through the joint exploitation of the above, the underlying semantic structure of multimedia data can be well preserved in Hamming space, which can benefit various cross-modal retrieval tasks. Extensive experiments on three benchmark datasets show that the proposed ATFH-N method yields the comparable performance in balanced scenario and brings substantial improvements over the state-of-the-art methods in imbalanced scenarios.
Original language | English |
---|---|
Article number | 9139424 |
Pages (from-to) | 607-619 |
Number of pages | 13 |
Journal | IEEE Transactions on Emerging Topics in Computational Intelligence |
Volume | 5 |
Issue number | 4 |
Early online date | 13 Jul 2020 |
DOIs | |
Publication status | Published - Aug 2021 |
Scopus Subject Areas
- Artificial Intelligence
- Computer Science Applications
- Computational Mathematics
- Control and Optimization
User-Defined Keywords
- adversarial tri-fusion hashing
- Bridges
- Computer science
- Correlation
- Cross-modal hashing
- imbalanced multi-modal data
- manifold structure
- Manifolds
- Semantics
- Training
- Training data