Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting

Qijie Shao; Jingyong Hou; Yanxin Hu; Qing Wang; Lei Xie; Xin Lei

Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting

Qijie Shao, Jingyong Hou, Yanxin Hu, Qing Wang, Lei Xie, Xin Lei

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

3 引用（Scopus）

摘要

To achieve a better user experience, it is desirable to have a customizable keyword spotting (KWS) system. Query-by-Example (QbE) is a promising way to achieve customization. In order to reduce the false alarms caused by interfering speech and ambient noise, we propose a speech enhancement frontend based on VoiceFilter for QbE based KWS system. VoiceFilter is a speaker extraction model, which extracts the voice of a target speaker from multi-speaker speech signals, with a reference signal from the target speaker. In this paper, we improve VoiceFilter substantially to better fit the KWS scenario, enhancing the voice of the target speaker, suppressing the voice of non-target speakers, and reducing ambient noise as well. To further reduce false rejections of the system with a VoiceFilter frontend, we apply exemplar augmentation to add reverberation to enrollment templates. Our proposed method leads to improved performance according to our experiments. Comparing with a DTW-based QbE system, our best system achieves a 39.0% relative reduction in false reject rate, at a false alarm rate of 0.5 times per hour.

源语言	英语
主期刊名	2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	672-678
页数	7
ISBN（电子版）	9789881476890
出版状态	已出版 - 2021
活动	2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, 日本期限: 14 12月 2021 → 17 12月 2021

出版系列

姓名	2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

会议

会议	2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
国家/地区	日本
市	Tokyo
时期	14/12/21 → 17/12/21

其它文件与链接

链接到 Scopus 的出版物

引用此

Shao, Q., Hou, J., Hu, Y., Wang, Q., Xie, L., & Lei, X. (2021). Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting. 在 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings (页码 672-678). (2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings). Institute of Electrical and Electronics Engineers Inc..

Shao, Qijie ; Hou, Jingyong ; Hu, Yanxin 等. / Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting. 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2021. 页码 672-678 (2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings).

@inproceedings{419c94b6206c442bad364eab9757bcfa,

title = "Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting",

abstract = "To achieve a better user experience, it is desirable to have a customizable keyword spotting (KWS) system. Query-by-Example (QbE) is a promising way to achieve customization. In order to reduce the false alarms caused by interfering speech and ambient noise, we propose a speech enhancement frontend based on VoiceFilter for QbE based KWS system. VoiceFilter is a speaker extraction model, which extracts the voice of a target speaker from multi-speaker speech signals, with a reference signal from the target speaker. In this paper, we improve VoiceFilter substantially to better fit the KWS scenario, enhancing the voice of the target speaker, suppressing the voice of non-target speakers, and reducing ambient noise as well. To further reduce false rejections of the system with a VoiceFilter frontend, we apply exemplar augmentation to add reverberation to enrollment templates. Our proposed method leads to improved performance according to our experiments. Comparing with a DTW-based QbE system, our best system achieves a 39.0% relative reduction in false reject rate, at a false alarm rate of 0.5 times per hour.",

keywords = "Front-end, Keyword spotting, Query-by-Example, Target Speaker Extraction, Wake-up word detection",

author = "Qijie Shao and Jingyong Hou and Yanxin Hu and Qing Wang and Lei Xie and Xin Lei",

note = "Publisher Copyright: {\textcopyright} 2021 APSIPA.; 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 ; Conference date: 14-12-2021 Through 17-12-2021",

year = "2021",

language = "英语",

series = "2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "672--678",

booktitle = "2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings",

}

Shao, Q, Hou, J, Hu, Y, Wang, Q, Xie, L & Lei, X 2021, Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting. 在 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings. 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 页码 672-678, 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021, Tokyo, 日本, 14/12/21.

Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting. / Shao, Qijie; Hou, Jingyong; Hu, Yanxin 等.
2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2021. 页码 672-678 (2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting

AU - Shao, Qijie

AU - Hou, Jingyong

AU - Hu, Yanxin

AU - Wang, Qing

AU - Xie, Lei

AU - Lei, Xin

PY - 2021

Y1 - 2021

N2 - To achieve a better user experience, it is desirable to have a customizable keyword spotting (KWS) system. Query-by-Example (QbE) is a promising way to achieve customization. In order to reduce the false alarms caused by interfering speech and ambient noise, we propose a speech enhancement frontend based on VoiceFilter for QbE based KWS system. VoiceFilter is a speaker extraction model, which extracts the voice of a target speaker from multi-speaker speech signals, with a reference signal from the target speaker. In this paper, we improve VoiceFilter substantially to better fit the KWS scenario, enhancing the voice of the target speaker, suppressing the voice of non-target speakers, and reducing ambient noise as well. To further reduce false rejections of the system with a VoiceFilter frontend, we apply exemplar augmentation to add reverberation to enrollment templates. Our proposed method leads to improved performance according to our experiments. Comparing with a DTW-based QbE system, our best system achieves a 39.0% relative reduction in false reject rate, at a false alarm rate of 0.5 times per hour.

AB - To achieve a better user experience, it is desirable to have a customizable keyword spotting (KWS) system. Query-by-Example (QbE) is a promising way to achieve customization. In order to reduce the false alarms caused by interfering speech and ambient noise, we propose a speech enhancement frontend based on VoiceFilter for QbE based KWS system. VoiceFilter is a speaker extraction model, which extracts the voice of a target speaker from multi-speaker speech signals, with a reference signal from the target speaker. In this paper, we improve VoiceFilter substantially to better fit the KWS scenario, enhancing the voice of the target speaker, suppressing the voice of non-target speakers, and reducing ambient noise as well. To further reduce false rejections of the system with a VoiceFilter frontend, we apply exemplar augmentation to add reverberation to enrollment templates. Our proposed method leads to improved performance according to our experiments. Comparing with a DTW-based QbE system, our best system achieves a 39.0% relative reduction in false reject rate, at a false alarm rate of 0.5 times per hour.

KW - Front-end

KW - Keyword spotting

KW - Query-by-Example

KW - Target Speaker Extraction

KW - Wake-up word detection

UR - http://www.scopus.com/inward/record.url?scp=85126720806&partnerID=8YFLogxK

M3 - 会议稿件

AN - SCOPUS:85126720806

T3 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

SP - 672

EP - 678

BT - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021

Y2 - 14 December 2021 through 17 December 2021

ER -

Shao Q, Hou J, Hu Y, Wang Q, Xie L, Lei X. Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting. 在 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2021. 页码 672-678. (2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings).

Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting

摘要

出版系列

会议

其它文件与链接

指纹

引用此