Audio-visual speaker recognition via multi-modal correlated neural networks

Jiajia Geng, Xin Liu*, Yiu Ming Cheung

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

15 Citations (Scopus)

Abstract

Multi-modal speaker recognition has received a lot of attention in recent years due to the growing security demands in real applications. In this paper, we present an efficient audiovisual speaker recognition method by fusing face and audio via the multi-modal correlated neural networks. Within our proposed approach, the facial features learned by convolutional neural networks are compatible with audio features at high-level and the heterogeneous multi-modal features can be learned automatically. Accordingly, we propose a correlated neural networks to fuse the face and audio modalities at different level such that the speaker identity can be well identified. The experimental results have shown that our proposed multi-modal speaker recognition approach can produce better performance than single modality, and the feature-level fusion yields comparative and even better results than the decision-level case.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops, WIW 2016
PublisherIEEE
Pages123-128
Number of pages6
ISBN (Electronic)9781509047710
DOIs
Publication statusPublished - 11 Jan 2017
Event2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops, WIW 2016 - Omaha, United States
Duration: 13 Oct 201616 Oct 2016

Publication series

NameProceedings - 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops, WIW 2016

Conference

Conference2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops, WIW 2016
Country/TerritoryUnited States
CityOmaha
Period13/10/1616/10/16

Scopus Subject Areas

  • Education
  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Audio-visual speaker recognition via multi-modal correlated neural networks'. Together they form a unique fingerprint.

Cite this