Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning

Wei Xue, Ying Tong, Chao Zhang, Guohong Ding, Xiaodong He, Bowen Zhou

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

10 Citations (Scopus)

Abstract

The performance of sound event localization and detection (SELD) degrades in source-overlapping cases since features of different sources collapse with each other, and the network tends to fail to learn to separate these features effectively. In this paper, by leveraging the conventional microphone array signal processing to generate comprehensive representations for SELD, we propose a new SELD method based on multiple direction of arrival (DOA) beamforming and multi-task learning. By using multiple beamformers to extract the signals from different DOAs, the sound field is more diversely described, and specialised representations of target source and noises can be obtained. With labelled training data, the steering vector is estimated based on the cross-power spectra (CPS) and the signal presence probability (SPP), which eliminates the need of knowing the array geometry. We design two networks for sound event localization (SED) and sound source localization (SSL) and use a multi-task learning scheme for SED, in which the SSL-related task act as a regularization. Experimental results using the database of DCASE2019 SELD task show that the proposed method achieves the state-of-art performance.

Original languageEnglish
Title of host publicationProceedings of 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
PublisherInternational Speech Communication Association
Pages5091-5095
Number of pages5
Volume2020-October
DOIs
Publication statusPublished - Oct 2020
Externally publishedYes
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Scopus Subject Areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

User-Defined Keywords

  • Beamforming
  • Microphone arrays
  • Multi-task learning
  • Sound event localization and detection

Fingerprint

Dive into the research topics of 'Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning'. Together they form a unique fingerprint.

Cite this