TY - GEN
T1 - Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting
AU - Shao, Qijie
AU - Hou, Jingyong
AU - Hu, Yanxin
AU - Wang, Qing
AU - Xie, Lei
AU - Lei, Xin
N1 - Publisher Copyright:
© 2021 APSIPA.
PY - 2021
Y1 - 2021
N2 - To achieve a better user experience, it is desirable to have a customizable keyword spotting (KWS) system. Query-by-Example (QbE) is a promising way to achieve customization. In order to reduce the false alarms caused by interfering speech and ambient noise, we propose a speech enhancement frontend based on VoiceFilter for QbE based KWS system. VoiceFilter is a speaker extraction model, which extracts the voice of a target speaker from multi-speaker speech signals, with a reference signal from the target speaker. In this paper, we improve VoiceFilter substantially to better fit the KWS scenario, enhancing the voice of the target speaker, suppressing the voice of non-target speakers, and reducing ambient noise as well. To further reduce false rejections of the system with a VoiceFilter frontend, we apply exemplar augmentation to add reverberation to enrollment templates. Our proposed method leads to improved performance according to our experiments. Comparing with a DTW-based QbE system, our best system achieves a 39.0% relative reduction in false reject rate, at a false alarm rate of 0.5 times per hour.
AB - To achieve a better user experience, it is desirable to have a customizable keyword spotting (KWS) system. Query-by-Example (QbE) is a promising way to achieve customization. In order to reduce the false alarms caused by interfering speech and ambient noise, we propose a speech enhancement frontend based on VoiceFilter for QbE based KWS system. VoiceFilter is a speaker extraction model, which extracts the voice of a target speaker from multi-speaker speech signals, with a reference signal from the target speaker. In this paper, we improve VoiceFilter substantially to better fit the KWS scenario, enhancing the voice of the target speaker, suppressing the voice of non-target speakers, and reducing ambient noise as well. To further reduce false rejections of the system with a VoiceFilter frontend, we apply exemplar augmentation to add reverberation to enrollment templates. Our proposed method leads to improved performance according to our experiments. Comparing with a DTW-based QbE system, our best system achieves a 39.0% relative reduction in false reject rate, at a false alarm rate of 0.5 times per hour.
KW - Front-end
KW - Keyword spotting
KW - Query-by-Example
KW - Target Speaker Extraction
KW - Wake-up word detection
UR - http://www.scopus.com/inward/record.url?scp=85126720806&partnerID=8YFLogxK
M3 - 会议稿件
AN - SCOPUS:85126720806
T3 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
SP - 672
EP - 678
BT - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Y2 - 14 December 2021 through 17 December 2021
ER -