Leveraging Synthetic Speech for CIF-Based Customized Keyword Spotting

Shuiyun Liu, Ao Zhang, Kaixun Huang, Lei Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Customized keyword spotting aims to detect user-defined keywords from continuous speech, providing flexibility and personalization. Previous research mainly relied on similarity calculations between keyword text and acoustic features. However, due to the gap between the two modalities, it is challenging to obtain alignment information and model their correlation. In our paper, we propose a novel method to address these issues. Firstly, we introduce a text-to-speech (TTS) module to generate the audio of keywords, effectively addressing the cross-modal challenge of text-based customized keyword spotting. Furthermore, we employ the Continuous Integrate-and-Fire (CIF) mechanism for boundary prediction to get token-level acoustic representations of keywords thus solving the keyword and speech alignment problem. Our experimental results on the Aishell-1 dataset demonstrate the effectiveness of our proposed method. It significantly outperforms both the baseline method and the Dynamic Sequence Partitioning (DSP) method in terms of keyword spotting accuracy. Compared with the DSP method, our model can achieve a significant improvement in the relative wake-up rate of 72.7% when the false accept rate is fixed at 0.02. And our model represents a 64% improvement over the baseline model.

Original languageEnglish
Title of host publicationMan-Machine Speech Communication - 18th National Conference, NCMMSC 2023, Proceedings
EditorsJia Jia, Zhenhua Ling, Xie Chen, Ya Li, Zixing Zhang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages354-365
Number of pages12
ISBN (Print)9789819706006
DOIs
StatePublished - 2024
Event18th National Conference on Man-Machine Speech Communication, NCMMSC 2023 - Suzhou, China
Duration: 8 Dec 202311 Dec 2023

Publication series

NameCommunications in Computer and Information Science
Volume2006
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference18th National Conference on Man-Machine Speech Communication, NCMMSC 2023
Country/TerritoryChina
CitySuzhou
Period8/12/2311/12/23

Keywords

  • Continuous Integrate-and-Fire
  • Keyword spotting
  • Speech synthesis

Fingerprint

Dive into the research topics of 'Leveraging Synthetic Speech for CIF-Based Customized Keyword Spotting'. Together they form a unique fingerprint.

Cite this