TY - JOUR
T1 - AS-70
T2 - 25th Interspeech Conferece 2024
AU - Gong, Rong
AU - Xue, Hongfei
AU - Wang, Lezhi
AU - Xu, Xin
AU - Li, Qisheng
AU - Xie, Lei
AU - Bu, Hui
AU - Wu, Shaomei
AU - Zhou, Jiaming
AU - Qin, Yong
AU - Zhang, Binbin
AU - Du, Jun
AU - Bin, Jia
AU - Li, Ming
N1 - Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.
PY - 2024
Y1 - 2024
N2 - The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech.
AB - The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech.
KW - mandarin stuttered speech dataset
KW - speech recognition
KW - stuttering event detection
UR - http://www.scopus.com/inward/record.url?scp=85204816586&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2024-918
DO - 10.21437/Interspeech.2024-918
M3 - 会议文章
AN - SCOPUS:85204816586
SN - 2308-457X
SP - 5098
EP - 5102
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 1 September 2024 through 5 September 2024
ER -