TY - GEN
T1 - Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning
AU - Xue, Wei
AU - Tong, Ying
AU - Zhang, Chao
AU - Ding, Guohong
AU - He, Xiaodong
AU - Zhou, Bowen
N1 - Publisher Copyright:
© 2020 ISCA
PY - 2020/10
Y1 - 2020/10
N2 - The performance of sound event localization and detection (SELD) degrades in source-overlapping cases since features of different sources collapse with each other, and the network tends to fail to learn to separate these features effectively. In this paper, by leveraging the conventional microphone array signal processing to generate comprehensive representations for SELD, we propose a new SELD method based on multiple direction of arrival (DOA) beamforming and multi-task learning. By using multiple beamformers to extract the signals from different DOAs, the sound field is more diversely described, and specialised representations of target source and noises can be obtained. With labelled training data, the steering vector is estimated based on the cross-power spectra (CPS) and the signal presence probability (SPP), which eliminates the need of knowing the array geometry. We design two networks for sound event localization (SED) and sound source localization (SSL) and use a multi-task learning scheme for SED, in which the SSL-related task act as a regularization. Experimental results using the database of DCASE2019 SELD task show that the proposed method achieves the state-of-art performance.
AB - The performance of sound event localization and detection (SELD) degrades in source-overlapping cases since features of different sources collapse with each other, and the network tends to fail to learn to separate these features effectively. In this paper, by leveraging the conventional microphone array signal processing to generate comprehensive representations for SELD, we propose a new SELD method based on multiple direction of arrival (DOA) beamforming and multi-task learning. By using multiple beamformers to extract the signals from different DOAs, the sound field is more diversely described, and specialised representations of target source and noises can be obtained. With labelled training data, the steering vector is estimated based on the cross-power spectra (CPS) and the signal presence probability (SPP), which eliminates the need of knowing the array geometry. We design two networks for sound event localization (SED) and sound source localization (SSL) and use a multi-task learning scheme for SED, in which the SSL-related task act as a regularization. Experimental results using the database of DCASE2019 SELD task show that the proposed method achieves the state-of-art performance.
KW - Beamforming
KW - Microphone arrays
KW - Multi-task learning
KW - Sound event localization and detection
UR - http://www.scopus.com/inward/record.url?scp=85098135575&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-2759
DO - 10.21437/Interspeech.2020-2759
M3 - Conference proceeding
AN - SCOPUS:85098135575
VL - 2020-October
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 5091
EP - 5095
BT - Proceedings of 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
PB - International Speech Communication Association
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -