AISHELL-4: An open source dataset for speech enhancement, separation, recognition and speaker diarization in conference scenario

Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen

科研成果: 书/报告/会议事项章节会议稿件同行评审

25 引用 (Scopus)

摘要

In this paper, we present AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bridge the advanced research on multi-speaker processing and the practical application scenario in three aspects. With real recorded meetings, AISHELL-4 provides realistic acoustics and rich natural speech characteristics in conversation such as short pause, speech overlap, quick speaker turn, noise, etc. Meanwhile, accurate transcription and speaker voice activity are provided for each meeting in AISHELL-4. This allows the researchers to explore different aspects in meeting processing, ranging from individual tasks such as speech front-end processing, speech recognition and speaker diarization, to multi-modality modeling and joint optimization of relevant tasks. Given most open source dataset for multi-speaker tasks are in English, AISHELL-4 is the only Mandarin dataset for conversation speech, providing additional value for data diversity in speech community. We also release a PyTorch-based training and evaluation framework as baseline system to promote reproducible research in this field.

源语言英语
主期刊名22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
出版商International Speech Communication Association
4406-4410
页数5
ISBN(电子版)9781713836902
DOI
出版状态已出版 - 2021
活动22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, 捷克共和国
期限: 30 8月 20213 9月 2021

出版系列

姓名Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
6
ISSN(印刷版)2308-457X
ISSN(电子版)1990-9772

会议

会议22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
国家/地区捷克共和国
Brno
时期30/08/213/09/21

指纹

探究 'AISHELL-4: An open source dataset for speech enhancement, separation, recognition and speaker diarization in conference scenario' 的科研主题。它们共同构成独一无二的指纹。

引用此