Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting

Qijie Shao, Jingyong Hou, Yanxin Hu, Qing Wang, Lei Xie, Xin Lei

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

To achieve a better user experience, it is desirable to have a customizable keyword spotting (KWS) system. Query-by-Example (QbE) is a promising way to achieve customization. In order to reduce the false alarms caused by interfering speech and ambient noise, we propose a speech enhancement frontend based on VoiceFilter for QbE based KWS system. VoiceFilter is a speaker extraction model, which extracts the voice of a target speaker from multi-speaker speech signals, with a reference signal from the target speaker. In this paper, we improve VoiceFilter substantially to better fit the KWS scenario, enhancing the voice of the target speaker, suppressing the voice of non-target speakers, and reducing ambient noise as well. To further reduce false rejections of the system with a VoiceFilter frontend, we apply exemplar augmentation to add reverberation to enrollment templates. Our proposed method leads to improved performance according to our experiments. Comparing with a DTW-based QbE system, our best system achieves a 39.0% relative reduction in false reject rate, at a false alarm rate of 0.5 times per hour.

Original languageEnglish
Title of host publication2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages672-678
Number of pages7
ISBN (Electronic)9789881476890
StatePublished - 2021
Event2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, Japan
Duration: 14 Dec 202117 Dec 2021

Publication series

Name2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

Conference

Conference2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Country/TerritoryJapan
CityTokyo
Period14/12/2117/12/21

Keywords

  • Front-end
  • Keyword spotting
  • Query-by-Example
  • Target Speaker Extraction
  • Wake-up word detection

Fingerprint

Dive into the research topics of 'Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting'. Together they form a unique fingerprint.

Cite this