AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

Rong Gong; Hongfei Xue; Lezhi Wang; Xin Xu; Qisheng Li; Lei Xie; Hui Bu; Shaomei Wu; Jiaming Zhou; Yong Qin; Binbin Zhang; Jun Du; Jia Bin; Ming Li

doi:10.21437/Interspeech.2024-918

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

2 Scopus citations

Abstract

The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech.

Original language	English
Pages (from-to)	5098-5102
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs	https://doi.org/10.21437/Interspeech.2024-918
State	Published - 2024
Event	25th Interspeech Conferece 2024 - Kos Island, Greece Duration: 1 Sep 2024 → 5 Sep 2024

Keywords

mandarin stuttered speech dataset
speech recognition
stuttering event detection

Access to Document

10.21437/Interspeech.2024-918

Cite this

Gong, R., Xue, H., Wang, L., Xu, X., Li, Q., Xie, L., Bu, H., Wu, S., Zhou, J., Qin, Y., Zhang, B., Du, J., Bin, J., & Li, M. (2024). AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 5098-5102. https://doi.org/10.21437/Interspeech.2024-918

@article{4fcc38d6abb0458fbf67b7d1066e943f,

title = "AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection",

abstract = "The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech.",

keywords = "mandarin stuttered speech dataset, speech recognition, stuttering event detection",

author = "Rong Gong and Hongfei Xue and Lezhi Wang and Xin Xu and Qisheng Li and Lei Xie and Hui Bu and Shaomei Wu and Jiaming Zhou and Yong Qin and Binbin Zhang and Jun Du and Jia Bin and Ming Li",

note = "Publisher Copyright: {\textcopyright} 2024 International Speech Communication Association. All rights reserved.; 25th Interspeech Conferece 2024 ; Conference date: 01-09-2024 Through 05-09-2024",

year = "2024",

doi = "10.21437/Interspeech.2024-918",

language = "英语",

pages = "5098--5102",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

Gong, R, Xue, H, Wang, L, Xu, X, Li, Q, Xie, L, Bu, H, Wu, S, Zhou, J, Qin, Y, Zhang, B, Du, J, Bin, J & Li, M 2024, 'AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection', Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 5098-5102. https://doi.org/10.21437/Interspeech.2024-918

TY - JOUR

T1 - AS-70

T2 - 25th Interspeech Conferece 2024

AU - Gong, Rong

AU - Xue, Hongfei

AU - Wang, Lezhi

AU - Xu, Xin

AU - Li, Qisheng

AU - Xie, Lei

AU - Bu, Hui

AU - Wu, Shaomei

AU - Zhou, Jiaming

AU - Qin, Yong

AU - Zhang, Binbin

AU - Du, Jun

AU - Bin, Jia

AU - Li, Ming

PY - 2024

Y1 - 2024

N2 - The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech.

AB - The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech.

KW - mandarin stuttered speech dataset

KW - speech recognition

KW - stuttering event detection

UR - http://www.scopus.com/inward/record.url?scp=85204816586&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2024-918

DO - 10.21437/Interspeech.2024-918

M3 - 会议文章

AN - SCOPUS:85204816586

SN - 2308-457X

SP - 5098

EP - 5102

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Y2 - 1 September 2024 through 5 September 2024

ER -

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this