TY - GEN
T1 - Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting
AU - Wang, Xiong
AU - Sun, Sining
AU - Xie, Lei
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Serving as the tigger of a voice-enabled user interface, on-device keyword spotting model has to be extremely compact, efficient and accurate. In this paper, we adopt a depth-wise separable convolutional neural network (DS-CNN) as our small-footprint KWS model, which is highly competitive to these ends. However, recent study has shown that a compact KWS system is very vulnerable to small adversarial perturbations while augmenting the training data with specifically-generated adversarial examples can improve performance. In this paper, we further improve KWS performance through a virtual adversarial training (VAT) solution. Instead of using adversarial examples for data augmentation, we propose to train a DS-CNN KWS model using adversarial regularization, which aims to smooth model's distribution and thus to improve robustness, by explicitly introducing a distribution smoothness measure into the loss function. Experiments on a collected KWS corpus using a circular microphone array in far-field scenario show that the VAT approach brings 31.9% relative false rejection rate (FRR) reduction compared to the normal training approach with cross entropy loss, and it also surpasses the adversarial example based data augmentation approach with 10.3% relative FRR reduction.
AB - Serving as the tigger of a voice-enabled user interface, on-device keyword spotting model has to be extremely compact, efficient and accurate. In this paper, we adopt a depth-wise separable convolutional neural network (DS-CNN) as our small-footprint KWS model, which is highly competitive to these ends. However, recent study has shown that a compact KWS system is very vulnerable to small adversarial perturbations while augmenting the training data with specifically-generated adversarial examples can improve performance. In this paper, we further improve KWS performance through a virtual adversarial training (VAT) solution. Instead of using adversarial examples for data augmentation, we propose to train a DS-CNN KWS model using adversarial regularization, which aims to smooth model's distribution and thus to improve robustness, by explicitly introducing a distribution smoothness measure into the loss function. Experiments on a collected KWS corpus using a circular microphone array in far-field scenario show that the VAT approach brings 31.9% relative false rejection rate (FRR) reduction compared to the normal training approach with cross entropy loss, and it also surpasses the adversarial example based data augmentation approach with 10.3% relative FRR reduction.
KW - depthwise separable convolutional neural network
KW - DS-CNN
KW - KWS
KW - virtual adversarial training
UR - http://www.scopus.com/inward/record.url?scp=85081614619&partnerID=8YFLogxK
U2 - 10.1109/ASRU46091.2019.9003745
DO - 10.1109/ASRU46091.2019.9003745
M3 - 会议稿件
AN - SCOPUS:85081614619
T3 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
SP - 607
EP - 612
BT - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
Y2 - 15 December 2019 through 18 December 2019
ER -