DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting

Shubo Lv; Xiong Wang; Sining Sun; Long Ma; Lei Xie

doi:10.21437/Interspeech.2023-1184

DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting

Shubo Lv, Xiong Wang, Sining Sun, Long Ma, Lei Xie

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

4 引用（Scopus）

摘要

Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system. Inspired by the recent advances of neural speech enhancement and context bias in speech recognition, we propose a robust audio context bias based DCCRN-KWS model to address this challenge. We form the whole architecture as a multi-task learning framework for both denoising and keyword spotting, where the DCCRN encoder is connected with the KWS model. Helped with the denoising task, we further introduce an audio context bias module to leverage the real keyword samples and bias the network to better discriminate keywords in noisy conditions. Feature merge and complex context linear modules are also introduced to strengthen such discrimination and to effectively leverage contextual information respectively. Experiments on an internal challenging dataset and the HIMIYA public dataset show that DCCRN-KWS is superior in performance, while the ablation study demonstrates the good design of the whole model.

源语言	英语
页（从-至）	929-933
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2023-August
DOI	https://doi.org/10.21437/Interspeech.2023-1184
出版状态	已出版 - 2023
活动	24th International Speech Communication Association, Interspeech 2023 - Dublin, 爱尔兰期限: 20 8月 2023 → 24 8月 2023

访问文件

10.21437/Interspeech.2023-1184

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{25543e40bf9f4fa696628c91611e5606,

title = "DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting",

abstract = "Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system. Inspired by the recent advances of neural speech enhancement and context bias in speech recognition, we propose a robust audio context bias based DCCRN-KWS model to address this challenge. We form the whole architecture as a multi-task learning framework for both denoising and keyword spotting, where the DCCRN encoder is connected with the KWS model. Helped with the denoising task, we further introduce an audio context bias module to leverage the real keyword samples and bias the network to better discriminate keywords in noisy conditions. Feature merge and complex context linear modules are also introduced to strengthen such discrimination and to effectively leverage contextual information respectively. Experiments on an internal challenging dataset and the HIMIYA public dataset show that DCCRN-KWS is superior in performance, while the ablation study demonstrates the good design of the whole model.",

keywords = "audio context bias, DCCRN-KWS, keyword spotting, Speech enhancement",

author = "Shubo Lv and Xiong Wang and Sining Sun and Long Ma and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2023 International Speech Communication Association. All rights reserved.; 24th International Speech Communication Association, Interspeech 2023 ; Conference date: 20-08-2023 Through 24-08-2023",

year = "2023",

doi = "10.21437/Interspeech.2023-1184",

language = "英语",

volume = "2023-August",

pages = "929--933",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - DCCRN-KWS

T2 - 24th International Speech Communication Association, Interspeech 2023

AU - Lv, Shubo

AU - Wang, Xiong

AU - Sun, Sining

AU - Ma, Long

AU - Xie, Lei

PY - 2023

Y1 - 2023

N2 - Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system. Inspired by the recent advances of neural speech enhancement and context bias in speech recognition, we propose a robust audio context bias based DCCRN-KWS model to address this challenge. We form the whole architecture as a multi-task learning framework for both denoising and keyword spotting, where the DCCRN encoder is connected with the KWS model. Helped with the denoising task, we further introduce an audio context bias module to leverage the real keyword samples and bias the network to better discriminate keywords in noisy conditions. Feature merge and complex context linear modules are also introduced to strengthen such discrimination and to effectively leverage contextual information respectively. Experiments on an internal challenging dataset and the HIMIYA public dataset show that DCCRN-KWS is superior in performance, while the ablation study demonstrates the good design of the whole model.

AB - Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system. Inspired by the recent advances of neural speech enhancement and context bias in speech recognition, we propose a robust audio context bias based DCCRN-KWS model to address this challenge. We form the whole architecture as a multi-task learning framework for both denoising and keyword spotting, where the DCCRN encoder is connected with the KWS model. Helped with the denoising task, we further introduce an audio context bias module to leverage the real keyword samples and bias the network to better discriminate keywords in noisy conditions. Feature merge and complex context linear modules are also introduced to strengthen such discrimination and to effectively leverage contextual information respectively. Experiments on an internal challenging dataset and the HIMIYA public dataset show that DCCRN-KWS is superior in performance, while the ablation study demonstrates the good design of the whole model.

KW - audio context bias

KW - DCCRN-KWS

KW - keyword spotting

KW - Speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85171531476&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2023-1184

DO - 10.21437/Interspeech.2023-1184

M3 - 会议文章

AN - SCOPUS:85171531476

SN - 2308-457X

VL - 2023-August

SP - 929

EP - 933

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Y2 - 20 August 2023 through 24 August 2023

ER -

DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting

摘要

访问文件

其它文件与链接

指纹

引用此