M2MET: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE

Fan Yu; Shiliang Zhang; Yihui Fu; Lei Xie; Siqi Zheng; Zhihao Du; Weilong Huang; Pengcheng Guo; Zhijie Yan; Bin Ma; Xin Xu; Hui Bu

doi:10.1109/ICASSP43922.2022.9746465

M2MET: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE

Fan Yu, Shiliang Zhang, Yihui Fu, Lei Xie, Siqi Zheng, Zhihao Du, Weilong Huang, Pengcheng Guo, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

84 Scopus citations

Abstract

Recent development of speech signal processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Speaker diarization and multi-speaker automatic speech recognition in meeting scenarios have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in meeting rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.

Original language	English
Title of host publication	2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	6167-6171
Number of pages	5
ISBN (Electronic)	9781665405409
DOIs	https://doi.org/10.1109/ICASSP43922.2022.9746465
State	Published - 2022
Externally published	Yes
Event	2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 - Hybrid, Singapore Duration: 22 May 2022 → 27 May 2022

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2022-May
ISSN (Print)	1520-6149

Conference

Conference	2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
Country/Territory	Singapore
City	Hybrid
Period	22/05/22 → 27/05/22

Keywords

AliMeeting
automatic speech recognition
meeting scenario
meeting transcription
speak diarization

Access to Document

10.1109/ICASSP43922.2022.9746465

Cite this

Yu, F., Zhang, S., Fu, Y., Xie, L., Zheng, S., Du, Z., Huang, W., Guo, P., Yan, Z., Ma, B., Xu, X., & Bu, H. (2022). M2MET: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings (pp. 6167-6171). (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP43922.2022.9746465

Yu, Fan ; Zhang, Shiliang ; Fu, Yihui et al. / M2MET : THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 6167-6171 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{e28386c789494aef9e9fff3b9330bfb5,

title = "M2MET: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE",

abstract = "Recent development of speech signal processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Speaker diarization and multi-speaker automatic speech recognition in meeting scenarios have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in meeting rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.",

keywords = "AliMeeting, automatic speech recognition, meeting scenario, meeting transcription, speak diarization",

author = "Fan Yu and Shiliang Zhang and Yihui Fu and Lei Xie and Siqi Zheng and Zhihao Du and Weilong Huang and Pengcheng Guo and Zhijie Yan and Bin Ma and Xin Xu and Hui Bu",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE; 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 ; Conference date: 22-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9746465",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6167--6171",

booktitle = "2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings",

}

Yu, F, Zhang, S, Fu, Y, Xie, L, Zheng, S, Du, Z, Huang, W, Guo, P, Yan, Z, Ma, B, Xu, X & Bu, H 2022, M2MET: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE. in 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2022-May, Institute of Electrical and Electronics Engineers Inc., pp. 6167-6171, 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Hybrid, Singapore, 22/05/22. https://doi.org/10.1109/ICASSP43922.2022.9746465

M2MET: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE. / Yu, Fan; Zhang, Shiliang; Fu, Yihui et al.
2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. p. 6167-6171 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - M2MET

T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022

AU - Yu, Fan

AU - Zhang, Shiliang

AU - Fu, Yihui

AU - Xie, Lei

AU - Zheng, Siqi

AU - Du, Zhihao

AU - Huang, Weilong

AU - Guo, Pengcheng

AU - Yan, Zhijie

AU - Ma, Bin

AU - Xu, Xin

AU - Bu, Hui

PY - 2022

Y1 - 2022

N2 - Recent development of speech signal processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Speaker diarization and multi-speaker automatic speech recognition in meeting scenarios have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in meeting rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.

AB - Recent development of speech signal processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Speaker diarization and multi-speaker automatic speech recognition in meeting scenarios have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in meeting rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.

KW - AliMeeting

KW - automatic speech recognition

KW - meeting scenario

KW - meeting transcription

KW - speak diarization

UR - http://www.scopus.com/inward/record.url?scp=85128522290&partnerID=8YFLogxK

U2 - 10.1109/ICASSP43922.2022.9746465

DO - 10.1109/ICASSP43922.2022.9746465

M3 - 会议稿件

AN - SCOPUS:85128522290

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 6167

EP - 6171

BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 22 May 2022 through 27 May 2022

ER -

Yu F, Zhang S, Fu Y, Xie L, Zheng S, Du Z et al. M2MET: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. p. 6167-6171. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP43922.2022.9746465

M2MET: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this