Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge

Hang Chen; Shilong Wu; Yusheng Dai; Zhe Wang; Jun Du; Chin Hui Lee; Jingdong Chen; Shinji Watanabe; Sabato Marco Siniscalchi; Odette Scharenborg; Di Yuan Liu; Bao Cai Yin; Jia Pan; Jian Qing Gao; Cong Liu

doi:10.1109/ICASSP49357.2023.10433931

Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge

Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Di Yuan Liu, Bao Cai Yin, Jia Pan, Jian Qing Gao, Cong Liu

Research output: Contribution to journal › Conference article › peer-review

1 Scopus citations

Abstract

The Multimodal Information based Speech Processing (MISP) 2022 challenge aimed to enhance speech processing performance in harsh acoustic environments by leveraging additional modalities such as video or text. The challenge included two tracks: audio-visual speaker diarization (AVSD) and audio-visual diarization and recognition (AVDR). The training material was based on previous MISP 2021 recordings, but we have accurately synchronized audio and visual data. Additionally, a new evaluation set was provided. This paper gives an overview of the challenge setup, presents the results, and summarizes the effective techniques employed by the participants. We also analyze the current technical challenges and suggest directions for future research in AVSD and AVDR.

Original language	English
Journal	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
DOIs	https://doi.org/10.1109/ICASSP49357.2023.10433931
State	Published - 2023
Externally published	Yes
Event	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023

Keywords

audio-visual
MISP challenge
speaker diarization
speech enhancement
speech recognition

Access to Document

10.1109/ICASSP49357.2023.10433931

Cite this

Chen, H., Wu, S., Dai, Y., Wang, Z., Du, J., Lee, C. H., Chen, J., Watanabe, S., Siniscalchi, S. M., Scharenborg, O., Liu, D. Y., Yin, B. C., Pan, J., Gao, J. Q., & Liu, C. (2023). Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP49357.2023.10433931

@article{6021815c57594629bc2314eb0b5d98e4,

title = "Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge",

abstract = "The Multimodal Information based Speech Processing (MISP) 2022 challenge aimed to enhance speech processing performance in harsh acoustic environments by leveraging additional modalities such as video or text. The challenge included two tracks: audio-visual speaker diarization (AVSD) and audio-visual diarization and recognition (AVDR). The training material was based on previous MISP 2021 recordings, but we have accurately synchronized audio and visual data. Additionally, a new evaluation set was provided. This paper gives an overview of the challenge setup, presents the results, and summarizes the effective techniques employed by the participants. We also analyze the current technical challenges and suggest directions for future research in AVSD and AVDR.",

keywords = "audio-visual, MISP challenge, speaker diarization, speech enhancement, speech recognition",

author = "Hang Chen and Shilong Wu and Yusheng Dai and Zhe Wang and Jun Du and Lee, {Chin Hui} and Jingdong Chen and Shinji Watanabe and Siniscalchi, {Sabato Marco} and Odette Scharenborg and Liu, {Di Yuan} and Yin, {Bao Cai} and Jia Pan and Gao, {Jian Qing} and Cong Liu",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 ; Conference date: 04-06-2023 Through 10-06-2023",

year = "2023",

doi = "10.1109/ICASSP49357.2023.10433931",

language = "英语",

journal = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

issn = "1520-6149",

}

Chen, H, Wu, S, Dai, Y, Wang, Z, Du, J, Lee, CH, Chen, J, Watanabe, S, Siniscalchi, SM, Scharenborg, O, Liu, DY, Yin, BC, Pan, J, Gao, JQ & Liu, C 2023, 'Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP49357.2023.10433931

TY - JOUR

T1 - Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge

AU - Chen, Hang

AU - Wu, Shilong

AU - Dai, Yusheng

AU - Wang, Zhe

AU - Du, Jun

AU - Lee, Chin Hui

AU - Chen, Jingdong

AU - Watanabe, Shinji

AU - Siniscalchi, Sabato Marco

AU - Scharenborg, Odette

AU - Liu, Di Yuan

AU - Yin, Bao Cai

AU - Pan, Jia

AU - Gao, Jian Qing

AU - Liu, Cong

PY - 2023

Y1 - 2023

N2 - The Multimodal Information based Speech Processing (MISP) 2022 challenge aimed to enhance speech processing performance in harsh acoustic environments by leveraging additional modalities such as video or text. The challenge included two tracks: audio-visual speaker diarization (AVSD) and audio-visual diarization and recognition (AVDR). The training material was based on previous MISP 2021 recordings, but we have accurately synchronized audio and visual data. Additionally, a new evaluation set was provided. This paper gives an overview of the challenge setup, presents the results, and summarizes the effective techniques employed by the participants. We also analyze the current technical challenges and suggest directions for future research in AVSD and AVDR.

AB - The Multimodal Information based Speech Processing (MISP) 2022 challenge aimed to enhance speech processing performance in harsh acoustic environments by leveraging additional modalities such as video or text. The challenge included two tracks: audio-visual speaker diarization (AVSD) and audio-visual diarization and recognition (AVDR). The training material was based on previous MISP 2021 recordings, but we have accurately synchronized audio and visual data. Additionally, a new evaluation set was provided. This paper gives an overview of the challenge setup, presents the results, and summarizes the effective techniques employed by the participants. We also analyze the current technical challenges and suggest directions for future research in AVSD and AVDR.

KW - audio-visual

KW - MISP challenge

KW - speaker diarization

KW - speech enhancement

KW - speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85185224952&partnerID=8YFLogxK

U2 - 10.1109/ICASSP49357.2023.10433931

DO - 10.1109/ICASSP49357.2023.10433931

M3 - 会议文章

AN - SCOPUS:85185224952

SN - 1520-6149

JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

Y2 - 4 June 2023 through 10 June 2023

ER -

Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this