A Single-Input/Binaural-Output Antiphasic Speech Enhancement Method for Speech Intelligibility Improvement

Ningning Pan; Yuzhu Wang; Jingdong Chen; Jacob Benesty

doi:10.1109/LSP.2021.3095016

A Single-Input/Binaural-Output Antiphasic Speech Enhancement Method for Speech Intelligibility Improvement

Ningning Pan, Yuzhu Wang, Jingdong Chen, Jacob Benesty

航海学院

科研成果: 期刊稿件 › 文章 › 同行评审

10 引用（Scopus）

摘要

Improving intelligibility of a speech signal of interest from its observations (with a single microphone) corrupted by additive noise has long been a challenging problem. Motivated by important findings achieved in the psychoacoustic field, we propose in this work a deep learning based method to render the noise and desired speech in the perceptual space such that the perception of the desired speech is least affected by the noise. Specifically, we adopt the temporal convolutional network (TCN) based structure to map the single-channel noisy observations into two binaural signals, one for the left ear and the other for the right ear. The TCN is trained in such a way that the desired speech and noise will be perceived to be in opposite directions when the listener listens to the binaural signals. This antiphasic binaural presentation enables the listener to better distinguish the desired speech from the annoying noise for improved speech intelligibility. The modified rhyme test is performed for evaluation and the results justify the superiority of the proposed method for speech intelligibility improvement.

源语言	英语
文章编号	9477058
页（从-至）	1445-1449
页数	5
期刊	IEEE Signal Processing Letters
卷	28
DOI	https://doi.org/10.1109/LSP.2021.3095016
出版状态	已出版 - 2021

访问文件

10.1109/LSP.2021.3095016

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{6d523a4da00e4180bd70ca4561478088,

title = "A Single-Input/Binaural-Output Antiphasic Speech Enhancement Method for Speech Intelligibility Improvement",

abstract = "Improving intelligibility of a speech signal of interest from its observations (with a single microphone) corrupted by additive noise has long been a challenging problem. Motivated by important findings achieved in the psychoacoustic field, we propose in this work a deep learning based method to render the noise and desired speech in the perceptual space such that the perception of the desired speech is least affected by the noise. Specifically, we adopt the temporal convolutional network (TCN) based structure to map the single-channel noisy observations into two binaural signals, one for the left ear and the other for the right ear. The TCN is trained in such a way that the desired speech and noise will be perceived to be in opposite directions when the listener listens to the binaural signals. This antiphasic binaural presentation enables the listener to better distinguish the desired speech from the annoying noise for improved speech intelligibility. The modified rhyme test is performed for evaluation and the results justify the superiority of the proposed method for speech intelligibility improvement.",

keywords = "Antiphasic rendering, Binaural, Deep learning, Intelligibility, Modified rhyme test, Speech enhancement",

author = "Ningning Pan and Yuzhu Wang and Jingdong Chen and Jacob Benesty",

note = "Publisher Copyright: {\textcopyright} 1994-2012 IEEE.",

year = "2021",

doi = "10.1109/LSP.2021.3095016",

language = "英语",

volume = "28",

pages = "1445--1449",

journal = "IEEE Signal Processing Letters",

issn = "1070-9908",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - A Single-Input/Binaural-Output Antiphasic Speech Enhancement Method for Speech Intelligibility Improvement

AU - Pan, Ningning

AU - Wang, Yuzhu

AU - Chen, Jingdong

AU - Benesty, Jacob

PY - 2021

Y1 - 2021

N2 - Improving intelligibility of a speech signal of interest from its observations (with a single microphone) corrupted by additive noise has long been a challenging problem. Motivated by important findings achieved in the psychoacoustic field, we propose in this work a deep learning based method to render the noise and desired speech in the perceptual space such that the perception of the desired speech is least affected by the noise. Specifically, we adopt the temporal convolutional network (TCN) based structure to map the single-channel noisy observations into two binaural signals, one for the left ear and the other for the right ear. The TCN is trained in such a way that the desired speech and noise will be perceived to be in opposite directions when the listener listens to the binaural signals. This antiphasic binaural presentation enables the listener to better distinguish the desired speech from the annoying noise for improved speech intelligibility. The modified rhyme test is performed for evaluation and the results justify the superiority of the proposed method for speech intelligibility improvement.

AB - Improving intelligibility of a speech signal of interest from its observations (with a single microphone) corrupted by additive noise has long been a challenging problem. Motivated by important findings achieved in the psychoacoustic field, we propose in this work a deep learning based method to render the noise and desired speech in the perceptual space such that the perception of the desired speech is least affected by the noise. Specifically, we adopt the temporal convolutional network (TCN) based structure to map the single-channel noisy observations into two binaural signals, one for the left ear and the other for the right ear. The TCN is trained in such a way that the desired speech and noise will be perceived to be in opposite directions when the listener listens to the binaural signals. This antiphasic binaural presentation enables the listener to better distinguish the desired speech from the annoying noise for improved speech intelligibility. The modified rhyme test is performed for evaluation and the results justify the superiority of the proposed method for speech intelligibility improvement.

KW - Antiphasic rendering

KW - Binaural

KW - Deep learning

KW - Intelligibility

KW - Modified rhyme test

KW - Speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85111973050&partnerID=8YFLogxK

U2 - 10.1109/LSP.2021.3095016

DO - 10.1109/LSP.2021.3095016

M3 - 文章

AN - SCOPUS:85111973050

SN - 1070-9908

VL - 28

SP - 1445

EP - 1449

JO - IEEE Signal Processing Letters

JF - IEEE Signal Processing Letters

M1 - 9477058

ER -

A Single-Input/Binaural-Output Antiphasic Speech Enhancement Method for Speech Intelligibility Improvement

摘要

访问文件

其它文件与链接

指纹

引用此