TY - GEN
T1 - Triplet fusion network hashing for unpaired cross-modal retrieval
AU - Hu, Zhikai
AU - Liu, Xin
AU - Wang, Xingzhi
AU - CHEUNG, Yiu Ming
AU - Wang, Nannan
AU - Chen, Yewang
N1 - Funding Information:
The work was supported by National Science Foundation of China (Nos. 61673185, 61672444, 61876142 and 61876068), the National Science Foundation of Fujian Province (Nos. 2017J01112 and 2018J01094), State Key Laboratory of Integrated Services Networks of Xidian University (No. ISN20-11), Quanzhou City Science & Technology Program of China (No. 2018C107R), Innovation and Technology Fund (ITF) with Project Code: ITS/339/18, Initiation Grant for Faculty Niche Research Areas of Hong Kong Baptist University with Project Code: RC-FNRA-IG/18-19/SCI/03, the open project of Provincial Key Laboratory for Computer Information Processing Technology, Soochow University (NO. KJS1839), and in part by CCF-Tencent Open Fund. Xin Liu is the corresponding author.
PY - 2019/6/5
Y1 - 2019/6/5
N2 - With the dramatic increase of multi-media data on the Internet, cross-modal retrieval has become an important and valuable task in searching systems. The key challenge of this task is how to build the correlation between multi-modal data. Most existing approaches only focus on dealing with paired data. They use pairwise relationship of multi-modal data for exploring the correlation between them. However, in practice, unpaired data are more common on the Internet but few methods pay attention to them. To utilize both paired and unpaired data, we propose a one-stream framework triplet fusion network hashing (TFNH), which mainly consists of two parts. The first part is a triplet network which is used to handle both kinds of data, with the help of zero padding operation. The second part consists of two data classifiers, which are used to bridge the gap between paired and unpaired data. In addition, we embed manifold learning into the framework for preserving both inter and intra modal similarity, exploring the relationship between unpaired and paired data and bridging the gap between them in learning process. Extensive experiments show that the proposed approach outperforms several state-of-the-art methods on two datasets in paired scenario. We further evaluate its ability of handling unpaired scenario and robustness in regard to pairwise constraint. The results show that even we discard 50% data under the setting in [19], the performance of TFNH is still better than that of other unpaired approaches and that only 70% pairwise relationships are preserved, TFNH can still outperform almost all paired approaches.
AB - With the dramatic increase of multi-media data on the Internet, cross-modal retrieval has become an important and valuable task in searching systems. The key challenge of this task is how to build the correlation between multi-modal data. Most existing approaches only focus on dealing with paired data. They use pairwise relationship of multi-modal data for exploring the correlation between them. However, in practice, unpaired data are more common on the Internet but few methods pay attention to them. To utilize both paired and unpaired data, we propose a one-stream framework triplet fusion network hashing (TFNH), which mainly consists of two parts. The first part is a triplet network which is used to handle both kinds of data, with the help of zero padding operation. The second part consists of two data classifiers, which are used to bridge the gap between paired and unpaired data. In addition, we embed manifold learning into the framework for preserving both inter and intra modal similarity, exploring the relationship between unpaired and paired data and bridging the gap between them in learning process. Extensive experiments show that the proposed approach outperforms several state-of-the-art methods on two datasets in paired scenario. We further evaluate its ability of handling unpaired scenario and robustness in regard to pairwise constraint. The results show that even we discard 50% data under the setting in [19], the performance of TFNH is still better than that of other unpaired approaches and that only 70% pairwise relationships are preserved, TFNH can still outperform almost all paired approaches.
KW - Cross-modal retrieval
KW - Hashing technique
KW - Triplet network
KW - Unpaired data
UR - http://www.scopus.com/inward/record.url?scp=85068050953&partnerID=8YFLogxK
U2 - 10.1145/3323873.3325041
DO - 10.1145/3323873.3325041
M3 - Conference proceeding
AN - SCOPUS:85068050953
T3 - ICMR 2019 - Proceedings of the 2019 ACM International Conference on Multimedia Retrieval
SP - 141
EP - 149
BT - ICMR 2019 - Proceedings of the 2019 ACM International Conference on Multimedia Retrieval
PB - Association for Computing Machinery (ACM)
T2 - 2019 ACM International Conference on Multimedia Retrieval, ICMR 2019
Y2 - 10 June 2019 through 13 June 2019
ER -