On Multiple-Input/Binaural-Output Antiphasic Speaker Signal Extraction

Xianrui Wang; Ningning Pan; Jacob Benesty; Jingdong Chen

doi:10.1109/ICASSP49357.2023.10097239

On Multiple-Input/Binaural-Output Antiphasic Speaker Signal Extraction

Xianrui Wang, Ningning Pan, Jacob Benesty, Jingdong Chen

航海学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

5 引用（Scopus）

摘要

This paper studies the problem of target speaker signal exaction and antiphasic rendering with an array of microphones in the scenarios where there are two active speakers. Based on the important findings achieved in the psychoacoustic field as well as our recent works on single-channel speech enhancement, we present a rendering based approach in which a temporal convolutional network (TCN) is trained to take the multiple signals observed by the microphone array as its inputs and generate two output (binaural) signals. The TCN is trained in such a way that, when binaural output signals are listened by the listener with headsets, the speech signal from the desired speaker is perceived on one side of and close to the listener's head, while the competing speech signal is perceived on the opposite side and also away from the listener's head. Benefited from rendering and the signal-to-interference ratio (SIR) improvement, this antiphasic binaural presentation enables the listener to better focus on the target speaker's signal while ignoring the impact of the competing speech. The modified rhyme tests (MRTs) are performed to validate the superiority of the proposed method.

源语言	英语
期刊	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
DOI	https://doi.org/10.1109/ICASSP49357.2023.10097239
出版状态	已出版 - 2023
活动	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, 希腊期限: 4 6月 2023 → 10 6月 2023

访问文件

10.1109/ICASSP49357.2023.10097239

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{798204c906db4d21aa72e140d506ffef,

title = "On Multiple-Input/Binaural-Output Antiphasic Speaker Signal Extraction",

abstract = "This paper studies the problem of target speaker signal exaction and antiphasic rendering with an array of microphones in the scenarios where there are two active speakers. Based on the important findings achieved in the psychoacoustic field as well as our recent works on single-channel speech enhancement, we present a rendering based approach in which a temporal convolutional network (TCN) is trained to take the multiple signals observed by the microphone array as its inputs and generate two output (binaural) signals. The TCN is trained in such a way that, when binaural output signals are listened by the listener with headsets, the speech signal from the desired speaker is perceived on one side of and close to the listener's head, while the competing speech signal is perceived on the opposite side and also away from the listener's head. Benefited from rendering and the signal-to-interference ratio (SIR) improvement, this antiphasic binaural presentation enables the listener to better focus on the target speaker's signal while ignoring the impact of the competing speech. The modified rhyme tests (MRTs) are performed to validate the superiority of the proposed method.",

keywords = "Antiphasic presentation, modified rhyme test, multiple-input/binaural-output, speaker extraction",

author = "Xianrui Wang and Ningning Pan and Jacob Benesty and Jingdong Chen",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 ; Conference date: 04-06-2023 Through 10-06-2023",

year = "2023",

doi = "10.1109/ICASSP49357.2023.10097239",

language = "英语",

journal = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

issn = "1520-6149",

}

TY - JOUR

T1 - On Multiple-Input/Binaural-Output Antiphasic Speaker Signal Extraction

AU - Wang, Xianrui

AU - Pan, Ningning

AU - Benesty, Jacob

AU - Chen, Jingdong

PY - 2023

Y1 - 2023

N2 - This paper studies the problem of target speaker signal exaction and antiphasic rendering with an array of microphones in the scenarios where there are two active speakers. Based on the important findings achieved in the psychoacoustic field as well as our recent works on single-channel speech enhancement, we present a rendering based approach in which a temporal convolutional network (TCN) is trained to take the multiple signals observed by the microphone array as its inputs and generate two output (binaural) signals. The TCN is trained in such a way that, when binaural output signals are listened by the listener with headsets, the speech signal from the desired speaker is perceived on one side of and close to the listener's head, while the competing speech signal is perceived on the opposite side and also away from the listener's head. Benefited from rendering and the signal-to-interference ratio (SIR) improvement, this antiphasic binaural presentation enables the listener to better focus on the target speaker's signal while ignoring the impact of the competing speech. The modified rhyme tests (MRTs) are performed to validate the superiority of the proposed method.

AB - This paper studies the problem of target speaker signal exaction and antiphasic rendering with an array of microphones in the scenarios where there are two active speakers. Based on the important findings achieved in the psychoacoustic field as well as our recent works on single-channel speech enhancement, we present a rendering based approach in which a temporal convolutional network (TCN) is trained to take the multiple signals observed by the microphone array as its inputs and generate two output (binaural) signals. The TCN is trained in such a way that, when binaural output signals are listened by the listener with headsets, the speech signal from the desired speaker is perceived on one side of and close to the listener's head, while the competing speech signal is perceived on the opposite side and also away from the listener's head. Benefited from rendering and the signal-to-interference ratio (SIR) improvement, this antiphasic binaural presentation enables the listener to better focus on the target speaker's signal while ignoring the impact of the competing speech. The modified rhyme tests (MRTs) are performed to validate the superiority of the proposed method.

KW - Antiphasic presentation

KW - modified rhyme test

KW - multiple-input/binaural-output

KW - speaker extraction

UR - http://www.scopus.com/inward/record.url?scp=85180014397&partnerID=8YFLogxK

U2 - 10.1109/ICASSP49357.2023.10097239

DO - 10.1109/ICASSP49357.2023.10097239

M3 - 会议文章

AN - SCOPUS:85180014397

SN - 1520-6149

JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

Y2 - 4 June 2023 through 10 June 2023

ER -

On Multiple-Input/Binaural-Output Antiphasic Speaker Signal Extraction

摘要

访问文件

其它文件与链接

指纹

引用此