TY - GEN
T1 - Learning Discriminative Joint Embeddings for Efficient Face and Voice Association
AU - Wang, Rui
AU - Liu, Xin
AU - Cheung, Yiu Ming
AU - Cheng, Kai
AU - Wang, Nannan
AU - Fan, Wentao
N1 - Funding Information:
The work is supported by National Science Foundation of China (Nos. 61672444, 61673185, 61876142, 61876068, and 61922066), State Key Lab. of ISN of Xidian University (No. ISN20-11), Quanzhou City Science & Technology Program of China (No. 2018C107R), Promotion Program for graduate student in Scientific research and innovation ability of Huaqiao University (No. 18014083018), IG-FNRA of HKBU with Grant: RC-FNRA-IG/18-19/SCI/03, and ITF of ITC of Hong Kong SAR under Project ITS/339/18.
PY - 2020/7/25
Y1 - 2020/7/25
N2 - Many cognitive researches have shown the natural possibility of face-voice association, and such potential association has attracted much attention in biometric cross-modal retrieval domain. Nevertheless, the existing methods often fail to explicitly learn the common embeddings for challenging face-voice association tasks. In this paper, we present to learn discriminative joint embedding for face-voice association, which can seamlessly train the face subnetwork and voice subnetwork to learn their high-level semantic features, while correlating them to be compared directly and efficiently. Within the proposed approach, we introduce bi-directional ranking constraint, identity constraint and center constraint to learn the joint face-voice embedding, and adopt bi-directional training strategy to train the deep correlated face-voice model. Meanwhile, an online hard negative mining technique is utilized to discriminatively construct hard triplets in a mini-batch manner, featuring on speeding up the learning process. Accordingly, the proposed approach is adaptive to benefit various face-voice association tasks, including cross-modal verification, 1:2 matching, 1:N matching, and retrieval scenarios. Extensive experiments have shown its improved performances in comparison with the state-of-the-art ones.
AB - Many cognitive researches have shown the natural possibility of face-voice association, and such potential association has attracted much attention in biometric cross-modal retrieval domain. Nevertheless, the existing methods often fail to explicitly learn the common embeddings for challenging face-voice association tasks. In this paper, we present to learn discriminative joint embedding for face-voice association, which can seamlessly train the face subnetwork and voice subnetwork to learn their high-level semantic features, while correlating them to be compared directly and efficiently. Within the proposed approach, we introduce bi-directional ranking constraint, identity constraint and center constraint to learn the joint face-voice embedding, and adopt bi-directional training strategy to train the deep correlated face-voice model. Meanwhile, an online hard negative mining technique is utilized to discriminatively construct hard triplets in a mini-batch manner, featuring on speeding up the learning process. Accordingly, the proposed approach is adaptive to benefit various face-voice association tasks, including cross-modal verification, 1:2 matching, 1:N matching, and retrieval scenarios. Extensive experiments have shown its improved performances in comparison with the state-of-the-art ones.
KW - bi-directional ranking constraint
KW - cross-modal verification
KW - discriminative joint embedding
KW - face-voice association
UR - http://www.scopus.com/inward/record.url?scp=85090137902&partnerID=8YFLogxK
U2 - 10.1145/3397271.3401302
DO - 10.1145/3397271.3401302
M3 - Conference proceeding
AN - SCOPUS:85090137902
T3 - SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 1881
EP - 1884
BT - SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery (ACM)
T2 - 43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020
Y2 - 25 July 2020 through 30 July 2020
ER -