The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition

Zhe Wang, Shilong Wu, Hang Chen, Mao Kui He, Jun Du, Chin Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu

科研成果: 书/报告/会议事项章节会议稿件同行评审

11 引用 (Scopus)

摘要

The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition, and other technologies. The MISP2022 challenge has two tracks: 1) audio-visual speaker diarization (AVSD), aiming to solve "who spoken when"using both audio and visual data; 2) a novel audio-visual diarization and recognition (AVDR) task that focuses on addressing "who spoken what when"with audio-visual speaker diarization results. Both tracks focus on the Chinese language, and use far-field audio and video in real home-tv scenarios: 2-6 people communicating each other with TV noise in the background. This paper introduces the dataset, track settings, and baselines of the MISP2022 challenge. Our analyses of experiments and examples indicate the good performance of AVDR baseline system, and the potential difficulties in this challenge due to, e.g., the far-field video quality, the presence of TV noise in the background, and the indistinguishable speakers.

源语言英语
主期刊名ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9781728163277
DOI
出版状态已出版 - 2023
活动48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, 希腊
期限: 4 6月 202310 6月 2023

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2023-June
ISSN(印刷版)1520-6149

会议

会议48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
国家/地区希腊
Rhodes Island
时期4/06/2310/06/23

指纹

探究 'The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此