Taking a Part for the Whole: An Archetype-agnostic Framework for Voice-Face Association

Guancheng Chen, Xin Liu*, Xing Xu, Yiu Ming Cheung*, Taihao Li

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review


Voice-face association is generally specialized as a cross-modal cognitive matching problem, and recent attention has been paid on the feasibility of devising the computational mechanisms for recognizing such associations. Existing works are commonly resorting to the combination of contrastive learning and classification-based loss to correlate the heterogeneous datas. Nevertheless, the reliance on typical features of each category, known as archetypes, derived from the combination suffer from the weak invariance of modality-specific features within the same identity, which might induce a cross-modal joint feature space with calibration deviations. To tackle these problems, this paper presents an efficient Archetype-agnostic framework for reliable voice-face association. First, an Archetype-agnostic Subspace Merging (AaSM) method is carefully designed to perform feature calibration which can well get rid of the archetype dependence to facilitate the mutual perception of datas. Further, an efficient Bilateral Connection Re-gauging scheme is proposed to quantitatively screen and calibrate the biased datas, namely loose pairs that deviate from joint feature space. Besides, an Instance Equilibrium strategy is dynamically derived to optimize the training process on loose data pairs and significantly improve the data utilization. Through the joint exploitation of the above, the proposed framework can well associate the voice-face data to benefit various kinds of cross-modal cognitive tasks. Extensive experiments verify the superiorities of the proposed voice-face association framework and show its competitive performances with the state-of-the-arts.

Original languageEnglish
Title of host publicationMM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery (ACM)
Number of pages9
ISBN (Print)9798400701085
Publication statusPublished - 27 Oct 2023
Event31st ACM International Conference on Multimedia, MM 2023 - Ottawa, Canada
Duration: 29 Oct 20233 Nov 2023
https://dl.acm.org/doi/proceedings/10.1145/3581783 (Conference proceedings)
https://www.acmmm2023.org/ (Conference website)

Publication series

NameProceedings of the ACM International Conference on Multimedia


Conference31st ACM International Conference on Multimedia, MM 2023
Internet address

Scopus Subject Areas

  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

User-Defined Keywords

  • archetype-agnostic
  • instance equilibrium
  • re-gauging
  • voice-face association


Dive into the research topics of 'Taking a Part for the Whole: An Archetype-agnostic Framework for Voice-Face Association'. Together they form a unique fingerprint.

Cite this