TY - GEN
T1 - Optimizing Dysarthria Wake-Up Word Spotting
T2 - 2024 IEEE Spoken Language Technology Workshop, SLT 2024
AU - Liu, Shuiyun
AU - Kong, Yuxiang
AU - Guo, Pengcheng
AU - Zhuang, Weiji
AU - Gao, Peng
AU - Wang, Yujun
AU - Xie, Lei
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Speech has emerged as a widely embraced user interface across diverse applications. However, for individuals with dysarthria, the inherent variability in their speech poses significant challenges. This paper presents an end-to-end Pretrain-based Dual-filter Dysarthria Wake-up word Spotting (PD-DWS) system for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge. Specifically, our system improves performance from two key perspectives: audio modeling and dual-filter strategy. For audio modeling, we propose an innovative 2 branch- d2v2 model based on the pre-trained data2vec 2d2v2), which can simultaneously model automatic speech recognition (ASR) and wake-up word spotting (WWS) tasks through a unified multi-task finetuning paradigm. Additionally, a dual-filter strategy is introduced to reduce the false accept rate (FAR) while maintaining the same false reject rate (FRR). Experimental results demonstrate that our PD-DWS system achieves an FAR of 0.00321 and an FRR of 0.005, with a total score of 0.00821 on the test-B eval set, securing first place in the challenge.
AB - Speech has emerged as a widely embraced user interface across diverse applications. However, for individuals with dysarthria, the inherent variability in their speech poses significant challenges. This paper presents an end-to-end Pretrain-based Dual-filter Dysarthria Wake-up word Spotting (PD-DWS) system for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge. Specifically, our system improves performance from two key perspectives: audio modeling and dual-filter strategy. For audio modeling, we propose an innovative 2 branch- d2v2 model based on the pre-trained data2vec 2d2v2), which can simultaneously model automatic speech recognition (ASR) and wake-up word spotting (WWS) tasks through a unified multi-task finetuning paradigm. Additionally, a dual-filter strategy is introduced to reduce the false accept rate (FAR) while maintaining the same false reject rate (FRR). Experimental results demonstrate that our PD-DWS system achieves an FAR of 0.00321 and an FRR of 0.005, with a total score of 0.00821 on the test-B eval set, securing first place in the challenge.
KW - 2brach-d2v2
KW - LRDWWS challenge
KW - dualfilter
KW - wake-up word spotting
UR - http://www.scopus.com/inward/record.url?scp=85217373102&partnerID=8YFLogxK
U2 - 10.1109/SLT61566.2024.10832263
DO - 10.1109/SLT61566.2024.10832263
M3 - 会议稿件
AN - SCOPUS:85217373102
T3 - Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
SP - 578
EP - 585
BT - Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 2 December 2024 through 5 December 2024
ER -