Mining Effective Negative Training Samples for Keyword Spotting

Jingyong Hou, Yangyang Shi, Mari Ostendorf, Mei Yuh Hwang, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

21 引用 (Scopus)

摘要

Max-pooling neural network architectures have been proven to be useful for keyword spotting (KWS), but standard training methods suffer from a class-imbalance problem when using all frames from negative utterances. To address the problem, we propose an innovative algorithm, Regional Hard-Example (RHE) mining, to find effective negative training samples, in order to control the ratio of negative vs. positive data. To maintain the diversity of the negative samples, multiple non-contiguous difficult frames per negative training utterance are dynamically selected during training, based on the model statistics at each training epoch. Further, to improve model learning, we introduce a weakly constrained max-pooling method for positive training utterances, which constrains max-pooling over the keyword ending frames only at early stages of training. Finally, data augmentation is combined to bring further improvement. We assess the algorithms by conducting experiments on wake-up word detection tasks with two different neural network architectures. The experiments consistently show that the proposed methods provide significant improvements compared to a strong baseline. At a false alarm rate of once per hour, our methods achieve 45-58% relative reduction in false rejection rates over a strong baseline.

源语言英语
主期刊名2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
7444-7448
页数5
ISBN(电子版)9781509066315
DOI
出版状态已出版 - 5月 2020
活动2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, 西班牙
期限: 4 5月 20208 5月 2020

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2020-May
ISSN(印刷版)1520-6149

会议

会议2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
国家/地区西班牙
Barcelona
时期4/05/208/05/20

指纹

探究 'Mining Effective Negative Training Samples for Keyword Spotting' 的科研主题。它们共同构成独一无二的指纹。

引用此