Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting

Xiong Wang; Sining Sun; Lei Xie

doi:10.1109/ASRU46091.2019.9003745

Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting

Xiong Wang, Sining Sun, Lei Xie

计算机学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

10 引用（Scopus）

摘要

Serving as the tigger of a voice-enabled user interface, on-device keyword spotting model has to be extremely compact, efficient and accurate. In this paper, we adopt a depth-wise separable convolutional neural network (DS-CNN) as our small-footprint KWS model, which is highly competitive to these ends. However, recent study has shown that a compact KWS system is very vulnerable to small adversarial perturbations while augmenting the training data with specifically-generated adversarial examples can improve performance. In this paper, we further improve KWS performance through a virtual adversarial training (VAT) solution. Instead of using adversarial examples for data augmentation, we propose to train a DS-CNN KWS model using adversarial regularization, which aims to smooth model's distribution and thus to improve robustness, by explicitly introducing a distribution smoothness measure into the loss function. Experiments on a collected KWS corpus using a circular microphone array in far-field scenario show that the VAT approach brings 31.9% relative false rejection rate (FRR) reduction compared to the normal training approach with cross entropy loss, and it also surpasses the adversarial example based data augmentation approach with 10.3% relative FRR reduction.

源语言	英语
主期刊名	2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	607-612
页数	6
ISBN（电子版）	9781728103068
DOI	https://doi.org/10.1109/ASRU46091.2019.9003745
出版状态	已出版 - 12月 2019
活动	2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Singapore, 新加坡期限: 15 12月 2019 → 18 12月 2019

出版系列

姓名	2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

会议

会议	2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
国家/地区	新加坡
市	Singapore
时期	15/12/19 → 18/12/19

访问文件

10.1109/ASRU46091.2019.9003745

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, X., Sun, S., & Xie, L. (2019). Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting. 在 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings (页码 607-612). 文章 9003745 (2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU46091.2019.9003745

Wang, Xiong ; Sun, Sining ; Xie, Lei. / Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting. 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 607-612 (2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings).

@inproceedings{842380a069de440a9887d1a4ce258dc9,

title = "Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting",

abstract = "Serving as the tigger of a voice-enabled user interface, on-device keyword spotting model has to be extremely compact, efficient and accurate. In this paper, we adopt a depth-wise separable convolutional neural network (DS-CNN) as our small-footprint KWS model, which is highly competitive to these ends. However, recent study has shown that a compact KWS system is very vulnerable to small adversarial perturbations while augmenting the training data with specifically-generated adversarial examples can improve performance. In this paper, we further improve KWS performance through a virtual adversarial training (VAT) solution. Instead of using adversarial examples for data augmentation, we propose to train a DS-CNN KWS model using adversarial regularization, which aims to smooth model's distribution and thus to improve robustness, by explicitly introducing a distribution smoothness measure into the loss function. Experiments on a collected KWS corpus using a circular microphone array in far-field scenario show that the VAT approach brings 31.9% relative false rejection rate (FRR) reduction compared to the normal training approach with cross entropy loss, and it also surpasses the adversarial example based data augmentation approach with 10.3% relative FRR reduction.",

keywords = "depthwise separable convolutional neural network, DS-CNN, KWS, virtual adversarial training",

author = "Xiong Wang and Sining Sun and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 ; Conference date: 15-12-2019 Through 18-12-2019",

year = "2019",

month = dec,

doi = "10.1109/ASRU46091.2019.9003745",

language = "英语",

series = "2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "607--612",

booktitle = "2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings",

}

Wang, X, Sun, S & Xie, L 2019, Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting. 在 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings., 9003745, 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 页码 607-612, 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, 新加坡, 15/12/19. https://doi.org/10.1109/ASRU46091.2019.9003745

Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting. / Wang, Xiong; Sun, Sining; Xie, Lei.
2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 607-612 9003745 (2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting

AU - Wang, Xiong

AU - Sun, Sining

AU - Xie, Lei

PY - 2019/12

Y1 - 2019/12

N2 - Serving as the tigger of a voice-enabled user interface, on-device keyword spotting model has to be extremely compact, efficient and accurate. In this paper, we adopt a depth-wise separable convolutional neural network (DS-CNN) as our small-footprint KWS model, which is highly competitive to these ends. However, recent study has shown that a compact KWS system is very vulnerable to small adversarial perturbations while augmenting the training data with specifically-generated adversarial examples can improve performance. In this paper, we further improve KWS performance through a virtual adversarial training (VAT) solution. Instead of using adversarial examples for data augmentation, we propose to train a DS-CNN KWS model using adversarial regularization, which aims to smooth model's distribution and thus to improve robustness, by explicitly introducing a distribution smoothness measure into the loss function. Experiments on a collected KWS corpus using a circular microphone array in far-field scenario show that the VAT approach brings 31.9% relative false rejection rate (FRR) reduction compared to the normal training approach with cross entropy loss, and it also surpasses the adversarial example based data augmentation approach with 10.3% relative FRR reduction.

AB - Serving as the tigger of a voice-enabled user interface, on-device keyword spotting model has to be extremely compact, efficient and accurate. In this paper, we adopt a depth-wise separable convolutional neural network (DS-CNN) as our small-footprint KWS model, which is highly competitive to these ends. However, recent study has shown that a compact KWS system is very vulnerable to small adversarial perturbations while augmenting the training data with specifically-generated adversarial examples can improve performance. In this paper, we further improve KWS performance through a virtual adversarial training (VAT) solution. Instead of using adversarial examples for data augmentation, we propose to train a DS-CNN KWS model using adversarial regularization, which aims to smooth model's distribution and thus to improve robustness, by explicitly introducing a distribution smoothness measure into the loss function. Experiments on a collected KWS corpus using a circular microphone array in far-field scenario show that the VAT approach brings 31.9% relative false rejection rate (FRR) reduction compared to the normal training approach with cross entropy loss, and it also surpasses the adversarial example based data augmentation approach with 10.3% relative FRR reduction.

KW - depthwise separable convolutional neural network

KW - DS-CNN

KW - KWS

KW - virtual adversarial training

UR - http://www.scopus.com/inward/record.url?scp=85081614619&partnerID=8YFLogxK

U2 - 10.1109/ASRU46091.2019.9003745

DO - 10.1109/ASRU46091.2019.9003745

M3 - 会议稿件

AN - SCOPUS:85081614619

T3 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

SP - 607

EP - 612

BT - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019

Y2 - 15 December 2019 through 18 December 2019

ER -

Wang X, Sun S, Xie L. Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting. 在 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. 页码 607-612. 9003745. (2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings). doi: 10.1109/ASRU46091.2019.9003745

Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此