Improved speaker-dependent separation for Chime-5 challenge

Jian Wu; Yong Xu; Shi Xiong Zhang; Lian Wu Chen; Meng Yu; Lei Xie; Dong Yu

doi:10.21437/Interspeech.2019-1569

Improved speaker-dependent separation for Chime-5 challenge

Jian Wu, Yong Xu, Shi Xiong Zhang, Lian Wu Chen, Meng Yu, Lei Xie, Dong Yu

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

1 引用（Scopus）

摘要

This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.

源语言	英语
页（从-至）	466-470
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2019-September
DOI	https://doi.org/10.21437/Interspeech.2019-1569
出版状态	已出版 - 2019
活动	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, 奥地利期限: 15 9月 2019 → 19 9月 2019

访问文件

10.21437/Interspeech.2019-1569

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{99f5afafc40a40a9b4962a322a92457b,

title = "Improved speaker-dependent separation for Chime-5 challenge",

abstract = "This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.",

keywords = "Beamforming, CHiME-5 challenge, Robust speech recognition, Speaker-dependent speech separation, Speech enhancement",

author = "Jian Wu and Yong Xu and Zhang, {Shi Xiong} and Chen, {Lian Wu} and Meng Yu and Lei Xie and Dong Yu",

note = "Publisher Copyright: Copyright {\textcopyright} 2019 ISCA; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

year = "2019",

doi = "10.21437/Interspeech.2019-1569",

language = "英语",

volume = "2019-September",

pages = "466--470",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Improved speaker-dependent separation for Chime-5 challenge

AU - Wu, Jian

AU - Xu, Yong

AU - Zhang, Shi Xiong

AU - Chen, Lian Wu

AU - Yu, Meng

AU - Xie, Lei

AU - Yu, Dong

PY - 2019

Y1 - 2019

N2 - This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.

AB - This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.

KW - Beamforming

KW - CHiME-5 challenge

KW - Robust speech recognition

KW - Speaker-dependent speech separation

KW - Speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85074709833&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-1569

DO - 10.21437/Interspeech.2019-1569

M3 - 会议文章

AN - SCOPUS:85074709833

SN - 2308-457X

VL - 2019-September

SP - 466

EP - 470

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Improved speaker-dependent separation for Chime-5 challenge

摘要

访问文件

其它文件与链接

指纹

引用此