Optimizing Dysarthria Wake-Up Word Spotting: an End-to-End Approach For SLT 2024 LRDWWS Challenge

Shuiyun Liu; Yuxiang Kong; Pengcheng Guo; Weiji Zhuang; Peng Gao; Yujun Wang; Lei Xie

doi:10.1109/SLT61566.2024.10832263

Optimizing Dysarthria Wake-Up Word Spotting: an End-to-End Approach For SLT 2024 LRDWWS Challenge

Shuiyun Liu, Yuxiang Kong, Pengcheng Guo, Weiji Zhuang, Peng Gao, Yujun Wang, Lei Xie

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Speech has emerged as a widely embraced user interface across diverse applications. However, for individuals with dysarthria, the inherent variability in their speech poses significant challenges. This paper presents an end-to-end Pretrain-based Dual-filter Dysarthria Wake-up word Spotting (PD-DWS) system for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge. Specifically, our system improves performance from two key perspectives: audio modeling and dual-filter strategy. For audio modeling, we propose an innovative 2 branch- d2v2 model based on the pre-trained data2vec 2d2v2), which can simultaneously model automatic speech recognition (ASR) and wake-up word spotting (WWS) tasks through a unified multi-task finetuning paradigm. Additionally, a dual-filter strategy is introduced to reduce the false accept rate (FAR) while maintaining the same false reject rate (FRR). Experimental results demonstrate that our PD-DWS system achieves an FAR of 0.00321 and an FRR of 0.005, with a total score of 0.00821 on the test-B eval set, securing first place in the challenge.

Original language	English
Title of host publication	Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	578-585
Number of pages	8
ISBN (Electronic)	9798350392258
DOIs	https://doi.org/10.1109/SLT61566.2024.10832263
State	Published - 2024
Event	2024 IEEE Spoken Language Technology Workshop, SLT 2024 - Macao, China Duration: 2 Dec 2024 → 5 Dec 2024

Publication series

Name	Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

Conference

Conference	2024 IEEE Spoken Language Technology Workshop, SLT 2024
Country/Territory	China
City	Macao
Period	2/12/24 → 5/12/24

Keywords

2brach-d2v2
LRDWWS challenge
dualfilter
wake-up word spotting

Access to Document

10.1109/SLT61566.2024.10832263

Cite this

Liu, S., Kong, Y., Guo, P., Zhuang, W., Gao, P., Wang, Y., & Xie, L. (2024). Optimizing Dysarthria Wake-Up Word Spotting: an End-to-End Approach For SLT 2024 LRDWWS Challenge. In Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024 (pp. 578-585). (Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SLT61566.2024.10832263

Liu, Shuiyun ; Kong, Yuxiang ; Guo, Pengcheng et al. / Optimizing Dysarthria Wake-Up Word Spotting : an End-to-End Approach For SLT 2024 LRDWWS Challenge. Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024. Institute of Electrical and Electronics Engineers Inc., 2024. pp. 578-585 (Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024).

@inproceedings{cd453b2aa69d49c58e2530f401cf7751,

title = "Optimizing Dysarthria Wake-Up Word Spotting: an End-to-End Approach For SLT 2024 LRDWWS Challenge",

abstract = "Speech has emerged as a widely embraced user interface across diverse applications. However, for individuals with dysarthria, the inherent variability in their speech poses significant challenges. This paper presents an end-to-end Pretrain-based Dual-filter Dysarthria Wake-up word Spotting (PD-DWS) system for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge. Specifically, our system improves performance from two key perspectives: audio modeling and dual-filter strategy. For audio modeling, we propose an innovative 2 branch- d2v2 model based on the pre-trained data2vec 2d2v2), which can simultaneously model automatic speech recognition (ASR) and wake-up word spotting (WWS) tasks through a unified multi-task finetuning paradigm. Additionally, a dual-filter strategy is introduced to reduce the false accept rate (FAR) while maintaining the same false reject rate (FRR). Experimental results demonstrate that our PD-DWS system achieves an FAR of 0.00321 and an FRR of 0.005, with a total score of 0.00821 on the test-B eval set, securing first place in the challenge.",

keywords = "2brach-d2v2, LRDWWS challenge, dualfilter, wake-up word spotting",

author = "Shuiyun Liu and Yuxiang Kong and Pengcheng Guo and Weiji Zhuang and Peng Gao and Yujun Wang and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE Spoken Language Technology Workshop, SLT 2024 ; Conference date: 02-12-2024 Through 05-12-2024",

year = "2024",

doi = "10.1109/SLT61566.2024.10832263",

language = "英语",

series = "Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "578--585",

booktitle = "Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024",

}

Liu, S, Kong, Y, Guo, P, Zhuang, W, Gao, P, Wang, Y & Xie, L 2024, Optimizing Dysarthria Wake-Up Word Spotting: an End-to-End Approach For SLT 2024 LRDWWS Challenge. in Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024. Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024, Institute of Electrical and Electronics Engineers Inc., pp. 578-585, 2024 IEEE Spoken Language Technology Workshop, SLT 2024, Macao, China, 2/12/24. https://doi.org/10.1109/SLT61566.2024.10832263

Optimizing Dysarthria Wake-Up Word Spotting: an End-to-End Approach For SLT 2024 LRDWWS Challenge. / Liu, Shuiyun; Kong, Yuxiang; Guo, Pengcheng et al.
Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024. Institute of Electrical and Electronics Engineers Inc., 2024. p. 578-585 (Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Optimizing Dysarthria Wake-Up Word Spotting

T2 - 2024 IEEE Spoken Language Technology Workshop, SLT 2024

AU - Liu, Shuiyun

AU - Kong, Yuxiang

AU - Guo, Pengcheng

AU - Zhuang, Weiji

AU - Gao, Peng

AU - Wang, Yujun

AU - Xie, Lei

PY - 2024

Y1 - 2024

N2 - Speech has emerged as a widely embraced user interface across diverse applications. However, for individuals with dysarthria, the inherent variability in their speech poses significant challenges. This paper presents an end-to-end Pretrain-based Dual-filter Dysarthria Wake-up word Spotting (PD-DWS) system for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge. Specifically, our system improves performance from two key perspectives: audio modeling and dual-filter strategy. For audio modeling, we propose an innovative 2 branch- d2v2 model based on the pre-trained data2vec 2d2v2), which can simultaneously model automatic speech recognition (ASR) and wake-up word spotting (WWS) tasks through a unified multi-task finetuning paradigm. Additionally, a dual-filter strategy is introduced to reduce the false accept rate (FAR) while maintaining the same false reject rate (FRR). Experimental results demonstrate that our PD-DWS system achieves an FAR of 0.00321 and an FRR of 0.005, with a total score of 0.00821 on the test-B eval set, securing first place in the challenge.

AB - Speech has emerged as a widely embraced user interface across diverse applications. However, for individuals with dysarthria, the inherent variability in their speech poses significant challenges. This paper presents an end-to-end Pretrain-based Dual-filter Dysarthria Wake-up word Spotting (PD-DWS) system for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge. Specifically, our system improves performance from two key perspectives: audio modeling and dual-filter strategy. For audio modeling, we propose an innovative 2 branch- d2v2 model based on the pre-trained data2vec 2d2v2), which can simultaneously model automatic speech recognition (ASR) and wake-up word spotting (WWS) tasks through a unified multi-task finetuning paradigm. Additionally, a dual-filter strategy is introduced to reduce the false accept rate (FAR) while maintaining the same false reject rate (FRR). Experimental results demonstrate that our PD-DWS system achieves an FAR of 0.00321 and an FRR of 0.005, with a total score of 0.00821 on the test-B eval set, securing first place in the challenge.

KW - 2brach-d2v2

KW - LRDWWS challenge

KW - dualfilter

KW - wake-up word spotting

UR - http://www.scopus.com/inward/record.url?scp=85217373102&partnerID=8YFLogxK

U2 - 10.1109/SLT61566.2024.10832263

DO - 10.1109/SLT61566.2024.10832263

M3 - 会议稿件

AN - SCOPUS:85217373102

T3 - Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

SP - 578

EP - 585

BT - Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 2 December 2024 through 5 December 2024

ER -

Liu S, Kong Y, Guo P, Zhuang W, Gao P, Wang Y et al. Optimizing Dysarthria Wake-Up Word Spotting: an End-to-End Approach For SLT 2024 LRDWWS Challenge. In Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024. Institute of Electrical and Electronics Engineers Inc. 2024. p. 578-585. (Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024). doi: 10.1109/SLT61566.2024.10832263

Optimizing Dysarthria Wake-Up Word Spotting: an End-to-End Approach For SLT 2024 LRDWWS Challenge

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this