AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition

  • Yuhang Dai
  • , He Wang
  • , Xingchen Li
  • , Zihan Zhang
  • , Shuiyuan Wang
  • , Lei Xie
  • , Xin Xu
  • , Hongxiao Guo
  • , Shaoji Zhang
  • , Hui Bu
  • , Wei Chen

Research output: Contribution to journalConference articlepeer-review

Abstract

This paper delineates AISHELL-5, the first open-source in-car multi-channel multi-speaker Mandarin automatic speech recognition (ASR) dataset. AISHLL-5 includes two parts: (1) over 100 hours of multi-channel speech data recorded in an electric vehicle across more than 60 real driving scenarios. This audio data consists of four far-field speech signals captured by microphones located on each car door, as well as near-field signals obtained from high-fidelity headset microphones worn by each speaker. (2) a collection of 40 hours of real-world environmental noise recordings, which supports the in-car speech data simulation. Moreover, we also provide an open-access, reproducible baseline system based on this dataset. This system features a speech frontend model that employs speech source separation to extract each speaker's clean speech from the far-field signals, along with a speech recognition module that accurately transcribes the content of each individual speaker. Experimental results demonstrate the challenges faced by various mainstream ASR models when evaluated on the AISHELL-5. We firmly believe the AISHELL-5 dataset will significantly advance the research on ASR systems under complex driving scenarios by establishing the first publicly available in-car ASR benchmark.

Original languageEnglish
Pages (from-to)5493-5497
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2025
Event26th Interspeech Conference 2025 - Rotterdam, Netherlands
Duration: 17 Aug 202521 Aug 2025

Keywords

  • AISHELL-5
  • in-car speech processing
  • speech frontend
  • speech recognition

Fingerprint

Dive into the research topics of 'AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition'. Together they form a unique fingerprint.

Cite this