Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge

Hang Chen, Shilong Wu, Chenxi Wang, Jun Du, Chin Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Jingdong Chen, Odette Scharenborg, Zhong Qiu Wang, Bao Cai Yin, Jia Pan

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Historically, MISP challenges have focused on audio-visual speech recognition (AVSR), where they have been particularly successful in complex acoustic scenarios. However, even the most sophisticated AVSR systems have been found to have performance limitations. Inspired by traditional robust speech recognition systems, where speech enhancement as a front-end can significantly improve accuracy, the MISP2023 challenge focused on audio-visual target speaker extraction (AVTSE). The primary goal of AVTSE is to enhance speech quality by exploiting the lip movements of the target speaker, thereby improving the final recognition performance. This paper provides a comprehensive overview of the challenge framework, describes the results, and summarizes the effective strategies employed by the contributions. In addition, we analyze the prevailing technical hurdles and provide recommendations for future directions to spur further progress in the AVTSE field.

源语言英语
主期刊名2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
123-124
页数2
ISBN(电子版)9798350374513
DOI
出版状态已出版 - 2024
活动2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Seoul, 韩国
期限: 14 4月 202419 4月 2024

出版系列

姓名2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings

会议

会议2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024
国家/地区韩国
Seoul
时期14/04/2419/04/24

指纹

探究 'Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge' 的科研主题。它们共同构成独一无二的指纹。

引用此