Abstract
This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.
| Original language | English |
|---|---|
| Pages (from-to) | 466-470 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Volume | 2019-September |
| DOIs | |
| State | Published - 2019 |
| Event | 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria Duration: 15 Sep 2019 → 19 Sep 2019 |
Keywords
- Beamforming
- CHiME-5 challenge
- Robust speech recognition
- Speaker-dependent speech separation
- Speech enhancement
Fingerprint
Dive into the research topics of 'Improved speaker-dependent separation for Chime-5 challenge'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver