Adversarial regularization for end-to-end robust speaker verification

Qing Wang; Pengcheng Guo; Sining Sun; Lei Xie; John H.L. Hansen

doi:10.21437/Interspeech.2019-2983

Adversarial regularization for end-to-end robust speaker verification

Qing Wang, Pengcheng Guo, Sining Sun, Lei Xie, John H.L. Hansen

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

43 引用（Scopus）

摘要

Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.

源语言	英语
页（从-至）	4010-4014
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2019-September
DOI	https://doi.org/10.21437/Interspeech.2019-2983
出版状态	已出版 - 2019
活动	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, 奥地利期限: 15 9月 2019 → 19 9月 2019

访问文件

10.21437/Interspeech.2019-2983

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{da4baccc9148430e915961c78a1fe64e,

title = "Adversarial regularization for end-to-end robust speaker verification",

abstract = "Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.",

keywords = "Adversarial example, Adversarial regularization, End-to-end robust SV, Fast gradient-sign method (FGSM), Local distributional smoothness (LDS)",

author = "Qing Wang and Pengcheng Guo and Sining Sun and Lei Xie and Hansen, {John H.L.}",

note = "Publisher Copyright: Copyright {\textcopyright} 2019 ISCA; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

year = "2019",

doi = "10.21437/Interspeech.2019-2983",

language = "英语",

volume = "2019-September",

pages = "4010--4014",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Adversarial regularization for end-to-end robust speaker verification

AU - Wang, Qing

AU - Guo, Pengcheng

AU - Sun, Sining

AU - Xie, Lei

AU - Hansen, John H.L.

PY - 2019

Y1 - 2019

N2 - Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.

AB - Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.

KW - Adversarial example

KW - Adversarial regularization

KW - End-to-end robust SV

KW - Fast gradient-sign method (FGSM)

KW - Local distributional smoothness (LDS)

UR - http://www.scopus.com/inward/record.url?scp=85074727570&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-2983

DO - 10.21437/Interspeech.2019-2983

M3 - 会议文章

AN - SCOPUS:85074727570

SN - 2308-457X

VL - 2019-September

SP - 4010

EP - 4014

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Adversarial regularization for end-to-end robust speaker verification

摘要

访问文件

其它文件与链接

指纹

引用此