U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias

Ao Zhang; Pan Zhou; Kaixun Huang; Yong Zou; Ming Liu; Lei Xie

doi:10.1109/ASRU57964.2023.10389755

U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias

Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Scopus citations

Abstract

Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41 % compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour.

Original language	English
Title of host publication	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9798350306897
DOIs	https://doi.org/10.1109/ASRU57964.2023.10389755
State	Published - 2023
Event	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 - Taipei, Taiwan, Province of China Duration: 16 Dec 2023 → 20 Dec 2023

Publication series

Name	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

Conference

Conference	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
Country/Territory	Taiwan, Province of China
City	Taipei
Period	16/12/23 → 20/12/23

Keywords

customized keyword bias
multi-task learning
Open-vocabulary keyword spotting
U2-KWS

Access to Document

10.1109/ASRU57964.2023.10389755

Cite this

Zhang, A., Zhou, P., Huang, K., Zou, Y., Liu, M., & Xie, L. (2023). U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 (2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU57964.2023.10389755

@inproceedings{5e5a9165d80945af8022726ac3b9242e,

title = "U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias",

abstract = "Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41 % compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour.",

keywords = "customized keyword bias, multi-task learning, Open-vocabulary keyword spotting, U2-KWS",

author = "Ao Zhang and Pan Zhou and Kaixun Huang and Yong Zou and Ming Liu and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 ; Conference date: 16-12-2023 Through 20-12-2023",

year = "2023",

doi = "10.1109/ASRU57964.2023.10389755",

language = "英语",

series = "2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023",

}

Zhang, A, Zhou, P, Huang, K, Zou, Y, Liu, M & Xie, L 2023, U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias. in 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023. 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Institute of Electrical and Electronics Engineers Inc., 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, Province of China, 16/12/23. https://doi.org/10.1109/ASRU57964.2023.10389755

U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias. / Zhang, Ao; Zhou, Pan; Huang, Kaixun et al.
2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023. Institute of Electrical and Electronics Engineers Inc., 2023. (2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - U2-KWS

T2 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

AU - Zhang, Ao

AU - Zhou, Pan

AU - Huang, Kaixun

AU - Zou, Yong

AU - Liu, Ming

AU - Xie, Lei

PY - 2023

Y1 - 2023

N2 - Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41 % compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour.

AB - Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41 % compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour.

KW - customized keyword bias

KW - multi-task learning

KW - Open-vocabulary keyword spotting

KW - U2-KWS

UR - http://www.scopus.com/inward/record.url?scp=85184659352&partnerID=8YFLogxK

U2 - 10.1109/ASRU57964.2023.10389755

DO - 10.1109/ASRU57964.2023.10389755

M3 - 会议稿件

AN - SCOPUS:85184659352

T3 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

BT - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 16 December 2023 through 20 December 2023

ER -

Zhang A, Zhou P, Huang K, Zou Y, Liu M, Xie L. U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023. Institute of Electrical and Electronics Engineers Inc. 2023. (2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023). doi: 10.1109/ASRU57964.2023.10389755

U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this