Abstract
Many researches in cognitive science have shown that humans often perform face-voice association for various perception tasks, and some recent data mining works have been designed in emulating such ability intelligently. Nevertheless, most methods often suffer from the degraded performance when there exist semantically irrelevant interference factors across different modalities. To alleviate this concern, this paper presents an efficient Disentangled Cross-modal Latent Representation (DCLR) method to adaptively detach the discriminative feature attributes and enhance the face-voice association. To be specific, the proposed DCLR framework consists of two-stage cross-modal disentangling process. First, the former stage employs the supervised contrastive learning to push the representations of face-voice data from the same person closer while pulling those representations of different person away. Then, the latter stage freezes all the parameters of the former stage, and further innovates a multi-layer orthogonal decoupling scheme to learn the disentangled latent representations, while filtering out the modality-dependent irrelevant factors. Besides, the cross-modal reconstruction loss is further utilized to narrow down the semantic gap between heterogeneous feature expressions. Through the joint exploitation of the above, the proposed framework can well associate the face-voice data to benefit various kinds of cross-modal perception tasks. Extensive experiments verify the superiorities of the proposed face-voice association framework and show its competitive performances.
Original language | English |
---|---|
Title of host publication | 2022 IEEE International Conference on Data Mining (ICDM) |
Publisher | IEEE |
Pages | 648-655 |
Number of pages | 8 |
ISBN (Electronic) | 9781665450997 |
ISBN (Print) | 9781665451000 |
DOIs | |
Publication status | Published - 28 Nov 2022 |
Event | 22nd International Conference on Data Mining, ICDM 2022 - Orlando, United States Duration: 28 Nov 2022 → 1 Dec 2022 https://ieeexplore.ieee.org/xpl/conhome/10027565/proceeding |
Publication series
Name | IEEE International Conference on Data Mining (ICDM) |
---|---|
ISSN (Print) | 1550-4786 |
ISSN (Electronic) | 2374-8486 |
Conference
Conference | 22nd International Conference on Data Mining, ICDM 2022 |
---|---|
Country/Territory | United States |
City | Orlando |
Period | 28/11/22 → 1/12/22 |
Internet address |
User-Defined Keywords
- Face-voice association
- disentangled latent representation
- contrastive learning
- orthogonal decoupling