TY - JOUR
T1 - Unsupervised Dual Deep Hashing with Semantic-Index and Content-Code for Cross-Modal Retrieval
AU - Zhang, Bin
AU - Zhang, Yue
AU - Li, Junyu
AU - Chen, Jiazhou
AU - Akutsu, Tatsuya
AU - Cheung, Yiu-ming
AU - Cai, Hongmin
N1 - This work was supported in part by the Key-Area Research and Development Program of Guangdong Province (2021B0909060002), the National Key Research and Development Program of China (2022YFE0112200), the Science and Technology Project of Guangdong Province (2022A0505050014), the Key-Area Research and Development Program of Guangzhou City (202206030009), the Key-Area of Guangdong Provincial Department of Education (2021ZDZX1005), the National Natural Science Foundation of China (U21A20520, 62325204, 62172112), and the China Postdoctoral Science Foundation (2023M730752).
PY - 2024/9/24
Y1 - 2024/9/24
N2 - Hashing technology has exhibited great cross-modal retrieval potential due to its appealing retrieval efficiency and storage effectiveness. Most current supervised cross-modal retrieval methods heavily rely on accurate semantic supervision, which is intractable for annotations with ever-growing sample sizes. By comparison, the existing unsupervised methods rely on accurate sample similarity preservation strategies with intensive computational costs to compensate for the lack of semantic guidance, which causes these methods to lose the power to bridge the semantic gap. Furthermore, both kinds of approaches need to search for the nearest samples among all samples in a large search space, whose process is laborious. To address these issues, this paper proposes an unsupervised dual deep hashing (UDDH) method with semantic-index and content-code for cross-modal retrieval. Deep hashing networks are utilized to extract deep features and jointly encode the dual hashing codes in a collaborative manner with a common semantic index and modality content codes to simultaneously bridge the semantic and heterogeneous gaps for cross-modal retrieval. The dual deep hashing architecture, comprising the head code on semantic index and tail codes on modality content, enhances the efficiency for cross-modal retrieval. A query sample only needs to search for the retrieved samples with the same semantic index, thus greatly shrinking the search space and achieving superior retrieval efficiency. UDDH integrates the learning processes of deep feature extraction, binary optimization, common semantic index, and modality content code within a unified model, allowing for collaborative optimization to enhance the overall performance. Extensive experiments are conducted to demonstrate the retrieval superiority of the proposed approach over the state-of-the-art baselines.
AB - Hashing technology has exhibited great cross-modal retrieval potential due to its appealing retrieval efficiency and storage effectiveness. Most current supervised cross-modal retrieval methods heavily rely on accurate semantic supervision, which is intractable for annotations with ever-growing sample sizes. By comparison, the existing unsupervised methods rely on accurate sample similarity preservation strategies with intensive computational costs to compensate for the lack of semantic guidance, which causes these methods to lose the power to bridge the semantic gap. Furthermore, both kinds of approaches need to search for the nearest samples among all samples in a large search space, whose process is laborious. To address these issues, this paper proposes an unsupervised dual deep hashing (UDDH) method with semantic-index and content-code for cross-modal retrieval. Deep hashing networks are utilized to extract deep features and jointly encode the dual hashing codes in a collaborative manner with a common semantic index and modality content codes to simultaneously bridge the semantic and heterogeneous gaps for cross-modal retrieval. The dual deep hashing architecture, comprising the head code on semantic index and tail codes on modality content, enhances the efficiency for cross-modal retrieval. A query sample only needs to search for the retrieved samples with the same semantic index, thus greatly shrinking the search space and achieving superior retrieval efficiency. UDDH integrates the learning processes of deep feature extraction, binary optimization, common semantic index, and modality content code within a unified model, allowing for collaborative optimization to enhance the overall performance. Extensive experiments are conducted to demonstrate the retrieval superiority of the proposed approach over the state-of-the-art baselines.
KW - Binary Optimization
KW - Cross-Modal Retrieval
KW - Deep Hashing
KW - Dual Coding
KW - Retrieval of Similar Content
KW - Sample Assignment
KW - Semantic Index
KW - Unsupervised Learning
UR - http://www.scopus.com/inward/record.url?scp=85204998263&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2024.3467130
DO - 10.1109/TPAMI.2024.3467130
M3 - Journal article
SN - 0162-8828
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
ER -