A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments

Tianqin Zheng; Hanchen Pei; Ningning Pan; Jilu Jin; Gongping Huang; Jingdong Chen; Jacob Benesty

doi:10.1109/APSIPAASC63619.2025.10848776

A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments

Tianqin Zheng, Hanchen Pei, Ningning Pan, Jilu Jin, Gongping Huang, Jingdong Chen, Jacob Benesty

School of Marine Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

In this paper, we address the challenge of single-channel speech separation in noisy environments, where two active speakers and background noise are present in the observed signal. We propose using a dual path recursive neural network (DPRNN) to estimate the desired binaural signals from the single-channel noisy input. When the estimated binaural signal is played through headsets, listeners perceive the two speakers as originating from opposite directions, with the background noise coming from a separate direction. Additionally, the background noise is perceived to be further away from the two speakers, resulting in an improved signal-to-noise ratio (SNR). Research in psychoacoustics indicates that spatial unmasking in the perceptual domain enhances speech intelligibility in complex auditory scenes. This hypothesis is supported by both subjective and objective evaluations, including a significant 26% improvement in modified rhyme test (MRT) scores reported in this paper.

Original language	English
Title of host publication	APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9798350367331
DOIs	https://doi.org/10.1109/APSIPAASC63619.2025.10848776
State	Published - 2024
Event	2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 - Macau, China Duration: 3 Dec 2024 → 6 Dec 2024

Publication series

Name	APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

Conference

Conference	2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
Country/Territory	China
City	Macau
Period	3/12/24 → 6/12/24

Keywords

Source separation
binaural hearing
speech enhancement
speech intelligibility

Access to Document

10.1109/APSIPAASC63619.2025.10848776

Cite this

Zheng, T., Pei, H., Pan, N., Jin, J., Huang, G., Chen, J., & Benesty, J. (2024). A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments. In APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024 (APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPAASC63619.2025.10848776

Zheng, Tianqin ; Pei, Hanchen ; Pan, Ningning et al. / A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments. APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024. Institute of Electrical and Electronics Engineers Inc., 2024. (APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024).

@inproceedings{f69b95118f904532af447da90c4ef129,

title = "A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments",

abstract = "In this paper, we address the challenge of single-channel speech separation in noisy environments, where two active speakers and background noise are present in the observed signal. We propose using a dual path recursive neural network (DPRNN) to estimate the desired binaural signals from the single-channel noisy input. When the estimated binaural signal is played through headsets, listeners perceive the two speakers as originating from opposite directions, with the background noise coming from a separate direction. Additionally, the background noise is perceived to be further away from the two speakers, resulting in an improved signal-to-noise ratio (SNR). Research in psychoacoustics indicates that spatial unmasking in the perceptual domain enhances speech intelligibility in complex auditory scenes. This hypothesis is supported by both subjective and objective evaluations, including a significant 26% improvement in modified rhyme test (MRT) scores reported in this paper.",

keywords = "Source separation, binaural hearing, speech enhancement, speech intelligibility",

author = "Tianqin Zheng and Hanchen Pei and Ningning Pan and Jilu Jin and Gongping Huang and Jingdong Chen and Jacob Benesty",

note = "Publisher Copyright: {\textcopyright} 2024 APSIPA.; 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 ; Conference date: 03-12-2024 Through 06-12-2024",

year = "2024",

doi = "10.1109/APSIPAASC63619.2025.10848776",

language = "英语",

series = "APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024",

}

Zheng, T, Pei, H, Pan, N, Jin, J, Huang, G, Chen, J & Benesty, J 2024, A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments. in APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024. APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024, Institute of Electrical and Electronics Engineers Inc., 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024, Macau, China, 3/12/24. https://doi.org/10.1109/APSIPAASC63619.2025.10848776

A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments. / Zheng, Tianqin; Pei, Hanchen; Pan, Ningning et al.
APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024. Institute of Electrical and Electronics Engineers Inc., 2024. (APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments

AU - Zheng, Tianqin

AU - Pei, Hanchen

AU - Pan, Ningning

AU - Jin, Jilu

AU - Huang, Gongping

AU - Chen, Jingdong

AU - Benesty, Jacob

PY - 2024

Y1 - 2024

N2 - In this paper, we address the challenge of single-channel speech separation in noisy environments, where two active speakers and background noise are present in the observed signal. We propose using a dual path recursive neural network (DPRNN) to estimate the desired binaural signals from the single-channel noisy input. When the estimated binaural signal is played through headsets, listeners perceive the two speakers as originating from opposite directions, with the background noise coming from a separate direction. Additionally, the background noise is perceived to be further away from the two speakers, resulting in an improved signal-to-noise ratio (SNR). Research in psychoacoustics indicates that spatial unmasking in the perceptual domain enhances speech intelligibility in complex auditory scenes. This hypothesis is supported by both subjective and objective evaluations, including a significant 26% improvement in modified rhyme test (MRT) scores reported in this paper.

AB - In this paper, we address the challenge of single-channel speech separation in noisy environments, where two active speakers and background noise are present in the observed signal. We propose using a dual path recursive neural network (DPRNN) to estimate the desired binaural signals from the single-channel noisy input. When the estimated binaural signal is played through headsets, listeners perceive the two speakers as originating from opposite directions, with the background noise coming from a separate direction. Additionally, the background noise is perceived to be further away from the two speakers, resulting in an improved signal-to-noise ratio (SNR). Research in psychoacoustics indicates that spatial unmasking in the perceptual domain enhances speech intelligibility in complex auditory scenes. This hypothesis is supported by both subjective and objective evaluations, including a significant 26% improvement in modified rhyme test (MRT) scores reported in this paper.

KW - Source separation

KW - binaural hearing

KW - speech enhancement

KW - speech intelligibility

UR - http://www.scopus.com/inward/record.url?scp=85218183200&partnerID=8YFLogxK

U2 - 10.1109/APSIPAASC63619.2025.10848776

DO - 10.1109/APSIPAASC63619.2025.10848776

M3 - 会议稿件

AN - SCOPUS:85218183200

T3 - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

BT - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024

Y2 - 3 December 2024 through 6 December 2024

ER -

Zheng T, Pei H, Pan N, Jin J, Huang G, Chen J et al. A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments. In APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024. Institute of Electrical and Electronics Engineers Inc. 2024. (APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024). doi: 10.1109/APSIPAASC63619.2025.10848776

A Single-Input/Binaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this