TY - GEN
T1 - M2MET
T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
AU - Yu, Fan
AU - Zhang, Shiliang
AU - Fu, Yihui
AU - Xie, Lei
AU - Zheng, Siqi
AU - Du, Zhihao
AU - Huang, Weilong
AU - Guo, Pengcheng
AU - Yan, Zhijie
AU - Ma, Bin
AU - Xu, Xin
AU - Bu, Hui
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022
Y1 - 2022
N2 - Recent development of speech signal processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Speaker diarization and multi-speaker automatic speech recognition in meeting scenarios have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in meeting rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.
AB - Recent development of speech signal processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Speaker diarization and multi-speaker automatic speech recognition in meeting scenarios have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in meeting rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.
KW - AliMeeting
KW - automatic speech recognition
KW - meeting scenario
KW - meeting transcription
KW - speak diarization
UR - http://www.scopus.com/inward/record.url?scp=85128522290&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9746465
DO - 10.1109/ICASSP43922.2022.9746465
M3 - 会议稿件
AN - SCOPUS:85128522290
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6167
EP - 6171
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 May 2022 through 27 May 2022
ER -