Learning Discriminative Joint Embeddings for Efficient Face and Voice Association

Rui Wang, Xin Liu, Yiu Ming CHEUNG, Kai Cheng, Nannan Wang, Wentao Fan

Research output: Chapter in book/report/conference proceedingConference contributionpeer-review

Abstract

Many cognitive researches have shown the natural possibility of face-voice association, and such potential association has attracted much attention in biometric cross-modal retrieval domain. Nevertheless, the existing methods often fail to explicitly learn the common embeddings for challenging face-voice association tasks. In this paper, we present to learn discriminative joint embedding for face-voice association, which can seamlessly train the face subnetwork and voice subnetwork to learn their high-level semantic features, while correlating them to be compared directly and efficiently. Within the proposed approach, we introduce bi-directional ranking constraint, identity constraint and center constraint to learn the joint face-voice embedding, and adopt bi-directional training strategy to train the deep correlated face-voice model. Meanwhile, an online hard negative mining technique is utilized to discriminatively construct hard triplets in a mini-batch manner, featuring on speeding up the learning process. Accordingly, the proposed approach is adaptive to benefit various face-voice association tasks, including cross-modal verification, 1:2 matching, 1:N matching, and retrieval scenarios. Extensive experiments have shown its improved performances in comparison with the state-of-the-art ones.

Original languageEnglish
Title of host publicationSIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages1881-1884
Number of pages4
ISBN (Electronic)9781450380164
DOIs
Publication statusPublished - 25 Jul 2020
Event43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020 - Virtual, Online, China
Duration: 25 Jul 202030 Jul 2020

Publication series

NameSIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020
Country/TerritoryChina
CityVirtual, Online
Period25/07/2030/07/20

Scopus Subject Areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems
  • Software

User-Defined Keywords

  • bi-directional ranking constraint
  • cross-modal verification
  • discriminative joint embedding
  • face-voice association

Fingerprint

Dive into the research topics of 'Learning Discriminative Joint Embeddings for Efficient Face and Voice Association'. Together they form a unique fingerprint.

Cite this