TY - GEN
T1 - A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments
AU - Zheng, Tianqin
AU - Pei, Hanchen
AU - Pan, Ningning
AU - Jin, Jilu
AU - Huang, Gongping
AU - Chen, Jingdong
AU - Benesty, Jacob
N1 - Publisher Copyright:
© 2024 APSIPA.
PY - 2024
Y1 - 2024
N2 - In this paper, we address the challenge of single-channel speech separation in noisy environments, where two active speakers and background noise are present in the observed signal. We propose using a dual path recursive neural network (DPRNN) to estimate the desired binaural signals from the single-channel noisy input. When the estimated binaural signal is played through headsets, listeners perceive the two speakers as originating from opposite directions, with the background noise coming from a separate direction. Additionally, the background noise is perceived to be further away from the two speakers, resulting in an improved signal-to-noise ratio (SNR). Research in psychoacoustics indicates that spatial unmasking in the perceptual domain enhances speech intelligibility in complex auditory scenes. This hypothesis is supported by both subjective and objective evaluations, including a significant 26% improvement in modified rhyme test (MRT) scores reported in this paper.
AB - In this paper, we address the challenge of single-channel speech separation in noisy environments, where two active speakers and background noise are present in the observed signal. We propose using a dual path recursive neural network (DPRNN) to estimate the desired binaural signals from the single-channel noisy input. When the estimated binaural signal is played through headsets, listeners perceive the two speakers as originating from opposite directions, with the background noise coming from a separate direction. Additionally, the background noise is perceived to be further away from the two speakers, resulting in an improved signal-to-noise ratio (SNR). Research in psychoacoustics indicates that spatial unmasking in the perceptual domain enhances speech intelligibility in complex auditory scenes. This hypothesis is supported by both subjective and objective evaluations, including a significant 26% improvement in modified rhyme test (MRT) scores reported in this paper.
KW - Source separation
KW - binaural hearing
KW - speech enhancement
KW - speech intelligibility
UR - http://www.scopus.com/inward/record.url?scp=85218183200&partnerID=8YFLogxK
U2 - 10.1109/APSIPAASC63619.2025.10848776
DO - 10.1109/APSIPAASC63619.2025.10848776
M3 - 会议稿件
AN - SCOPUS:85218183200
T3 - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
BT - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
Y2 - 3 December 2024 through 6 December 2024
ER -