Improved speaker-dependent separation for Chime-5 challenge

Jian Wu; Yong Xu; Shi Xiong Zhang; Lian Wu Chen; Meng Yu; Lei Xie; Dong Yu

doi:10.21437/Interspeech.2019-1569

Improved speaker-dependent separation for Chime-5 challenge

Jian Wu, Yong Xu, Shi Xiong Zhang, Lian Wu Chen, Meng Yu, Lei Xie, Dong Yu

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

1 Scopus citations

Abstract

This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.

Original language	English
Pages (from-to)	466-470
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2019-September
DOIs	https://doi.org/10.21437/Interspeech.2019-1569
State	Published - 2019
Event	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria Duration: 15 Sep 2019 → 19 Sep 2019

Keywords

Beamforming
CHiME-5 challenge
Robust speech recognition
Speaker-dependent speech separation
Speech enhancement

Access to Document

10.21437/Interspeech.2019-1569

Cite this

@article{99f5afafc40a40a9b4962a322a92457b,

title = "Improved speaker-dependent separation for Chime-5 challenge",

abstract = "This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.",

keywords = "Beamforming, CHiME-5 challenge, Robust speech recognition, Speaker-dependent speech separation, Speech enhancement",

author = "Jian Wu and Yong Xu and Zhang, {Shi Xiong} and Chen, {Lian Wu} and Meng Yu and Lei Xie and Dong Yu",

note = "Publisher Copyright: Copyright {\textcopyright} 2019 ISCA; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

year = "2019",

doi = "10.21437/Interspeech.2019-1569",

language = "英语",

volume = "2019-September",

pages = "466--470",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Improved speaker-dependent separation for Chime-5 challenge

AU - Wu, Jian

AU - Xu, Yong

AU - Zhang, Shi Xiong

AU - Chen, Lian Wu

AU - Yu, Meng

AU - Xie, Lei

AU - Yu, Dong

PY - 2019

Y1 - 2019

N2 - This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.

AB - This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.

KW - Beamforming

KW - CHiME-5 challenge

KW - Robust speech recognition

KW - Speaker-dependent speech separation

KW - Speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85074709833&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-1569

DO - 10.21437/Interspeech.2019-1569

M3 - 会议文章

AN - SCOPUS:85074709833

SN - 2308-457X

VL - 2019-September

SP - 466

EP - 470

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Improved speaker-dependent separation for Chime-5 challenge

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this