TY - JOUR
T1 - On Multiple-Input/Binaural-Output Antiphasic Speaker Signal Extraction
AU - Wang, Xianrui
AU - Pan, Ningning
AU - Benesty, Jacob
AU - Chen, Jingdong
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - This paper studies the problem of target speaker signal exaction and antiphasic rendering with an array of microphones in the scenarios where there are two active speakers. Based on the important findings achieved in the psychoacoustic field as well as our recent works on single-channel speech enhancement, we present a rendering based approach in which a temporal convolutional network (TCN) is trained to take the multiple signals observed by the microphone array as its inputs and generate two output (binaural) signals. The TCN is trained in such a way that, when binaural output signals are listened by the listener with headsets, the speech signal from the desired speaker is perceived on one side of and close to the listener's head, while the competing speech signal is perceived on the opposite side and also away from the listener's head. Benefited from rendering and the signal-to-interference ratio (SIR) improvement, this antiphasic binaural presentation enables the listener to better focus on the target speaker's signal while ignoring the impact of the competing speech. The modified rhyme tests (MRTs) are performed to validate the superiority of the proposed method.
AB - This paper studies the problem of target speaker signal exaction and antiphasic rendering with an array of microphones in the scenarios where there are two active speakers. Based on the important findings achieved in the psychoacoustic field as well as our recent works on single-channel speech enhancement, we present a rendering based approach in which a temporal convolutional network (TCN) is trained to take the multiple signals observed by the microphone array as its inputs and generate two output (binaural) signals. The TCN is trained in such a way that, when binaural output signals are listened by the listener with headsets, the speech signal from the desired speaker is perceived on one side of and close to the listener's head, while the competing speech signal is perceived on the opposite side and also away from the listener's head. Benefited from rendering and the signal-to-interference ratio (SIR) improvement, this antiphasic binaural presentation enables the listener to better focus on the target speaker's signal while ignoring the impact of the competing speech. The modified rhyme tests (MRTs) are performed to validate the superiority of the proposed method.
KW - Antiphasic presentation
KW - modified rhyme test
KW - multiple-input/binaural-output
KW - speaker extraction
UR - http://www.scopus.com/inward/record.url?scp=85180014397&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49357.2023.10097239
DO - 10.1109/ICASSP49357.2023.10097239
M3 - 会议文章
AN - SCOPUS:85180014397
SN - 1520-6149
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Y2 - 4 June 2023 through 10 June 2023
ER -