TY - GEN
T1 - U2-KWS
T2 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
AU - Zhang, Ao
AU - Zhou, Pan
AU - Huang, Kaixun
AU - Zou, Yong
AU - Liu, Ming
AU - Xie, Lei
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41 % compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour.
AB - Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41 % compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour.
KW - customized keyword bias
KW - multi-task learning
KW - Open-vocabulary keyword spotting
KW - U2-KWS
UR - http://www.scopus.com/inward/record.url?scp=85184659352&partnerID=8YFLogxK
U2 - 10.1109/ASRU57964.2023.10389755
DO - 10.1109/ASRU57964.2023.10389755
M3 - 会议稿件
AN - SCOPUS:85184659352
T3 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
BT - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 December 2023 through 20 December 2023
ER -