Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge

Hang Chen, Shilong Wu, Chenxi Wang, Jun Du, Chin Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Jingdong Chen, Odette Scharenborg, Zhong Qiu Wang, Bao Cai Yin, Jia Pan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Historically, MISP challenges have focused on audio-visual speech recognition (AVSR), where they have been particularly successful in complex acoustic scenarios. However, even the most sophisticated AVSR systems have been found to have performance limitations. Inspired by traditional robust speech recognition systems, where speech enhancement as a front-end can significantly improve accuracy, the MISP2023 challenge focused on audio-visual target speaker extraction (AVTSE). The primary goal of AVTSE is to enhance speech quality by exploiting the lip movements of the target speaker, thereby improving the final recognition performance. This paper provides a comprehensive overview of the challenge framework, describes the results, and summarizes the effective strategies employed by the contributions. In addition, we analyze the prevailing technical hurdles and provide recommendations for future directions to spur further progress in the AVTSE field.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages123-124
Number of pages2
ISBN (Electronic)9798350374513
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Publication series

Name2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings

Conference

Conference2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24

Keywords

  • MISP challenge
  • audio-visual
  • robust speech recognition
  • target speaker extraction

Fingerprint

Dive into the research topics of 'Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge'. Together they form a unique fingerprint.

Cite this