TY - GEN
T1 - Full-Sphere Binaural Sound Source Localization Using Multi-task Neural Network
AU - Yang, Yichen
AU - Xi, Jingwei
AU - Zhang, Wen
AU - Zhang, Lijun
N1 - Publisher Copyright:
© 2020 APSIPA.
PY - 2020/12/7
Y1 - 2020/12/7
N2 - The accuracy of binaural sound source localization is faced with the challenge of localizing azimuth and elevation simultaneously in noisy and reverberant environments. In this work, a full-sphere binaural sound source localization system is proposed using convolutional neural network and multi-task neural network connected to learn the localization features. The log-magnitudes and interaural phase difference (IPD) of binaural signals are used as inputs to a two-branch convolutional neural network, from which interaural and monaural cues are extracted and combined. Then, the full-sphere localization is formulated as two subtasks of estimating azimuth and elevation separately using multi-task neural network. To reduce reverberation effects, the interaural coherence based pre-processing is used to select the direct-path dominated time-frequency bins for localization. The proposed system is evaluated at a variety of noise and reverberation conditions, in comparison with two baseline systems. The results indicate that the proposed system achieves better localization performance, especially for elevation estimation, at low SNR and strong reverberation conditions.
AB - The accuracy of binaural sound source localization is faced with the challenge of localizing azimuth and elevation simultaneously in noisy and reverberant environments. In this work, a full-sphere binaural sound source localization system is proposed using convolutional neural network and multi-task neural network connected to learn the localization features. The log-magnitudes and interaural phase difference (IPD) of binaural signals are used as inputs to a two-branch convolutional neural network, from which interaural and monaural cues are extracted and combined. Then, the full-sphere localization is formulated as two subtasks of estimating azimuth and elevation separately using multi-task neural network. To reduce reverberation effects, the interaural coherence based pre-processing is used to select the direct-path dominated time-frequency bins for localization. The proposed system is evaluated at a variety of noise and reverberation conditions, in comparison with two baseline systems. The results indicate that the proposed system achieves better localization performance, especially for elevation estimation, at low SNR and strong reverberation conditions.
UR - http://www.scopus.com/inward/record.url?scp=85100935485&partnerID=8YFLogxK
M3 - 会议稿件
AN - SCOPUS:85100935485
T3 - 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings
SP - 432
EP - 436
BT - 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020
Y2 - 7 December 2020 through 10 December 2020
ER -