Abstract
Voice-face association is generally specialized as a cross-modal cognitive matching problem, and recent attention has been paid on the feasibility of devising the computational mechanisms for recognizing such associations. Existing works are commonly resorting to the combination of contrastive learning and classification-based loss to correlate the heterogeneous datas. Nevertheless, the reliance on typical features of each category, known as archetypes, derived from the combination suffer from the weak invariance of modality-specific features within the same identity, which might induce a cross-modal joint feature space with calibration deviations. To tackle these problems, this paper presents an efficient Archetype-agnostic framework for reliable voice-face association. First, an Archetype-agnostic Subspace Merging (AaSM) method is carefully designed to perform feature calibration which can well get rid of the archetype dependence to facilitate the mutual perception of datas. Further, an efficient Bilateral Connection Re-gauging scheme is proposed to quantitatively screen and calibrate the biased datas, namely loose pairs that deviate from joint feature space. Besides, an Instance Equilibrium strategy is dynamically derived to optimize the training process on loose data pairs and significantly improve the data utilization. Through the joint exploitation of the above, the proposed framework can well associate the voice-face data to benefit various kinds of cross-modal cognitive tasks. Extensive experiments verify the superiorities of the proposed voice-face association framework and show its competitive performances with the state-of-the-arts.
Original language | English |
---|---|
Title of host publication | MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia |
Publisher | Association for Computing Machinery (ACM) |
Pages | 7056-7064 |
Number of pages | 9 |
ISBN (Print) | 9798400701085 |
DOIs | |
Publication status | Published - 27 Oct 2023 |
Event | 31st ACM International Conference on Multimedia, MM 2023 - Ottawa, Canada Duration: 29 Oct 2023 → 3 Nov 2023 https://dl.acm.org/doi/proceedings/10.1145/3581783 (Conference proceedings) https://www.acmmm2023.org/ (Conference website) |
Publication series
Name | Proceedings of the ACM International Conference on Multimedia |
---|
Conference
Conference | 31st ACM International Conference on Multimedia, MM 2023 |
---|---|
Country/Territory | Canada |
City | Ottawa |
Period | 29/10/23 → 3/11/23 |
Internet address |
|
Scopus Subject Areas
- Artificial Intelligence
- Computer Graphics and Computer-Aided Design
- Human-Computer Interaction
- Software
User-Defined Keywords
- archetype-agnostic
- instance equilibrium
- re-gauging
- voice-face association