Adversarial regularization for end-to-end robust speaker verification

Qing Wang; Pengcheng Guo; Sining Sun; Lei Xie; John H.L. Hansen

doi:10.21437/Interspeech.2019-2983

Adversarial regularization for end-to-end robust speaker verification

Qing Wang, Pengcheng Guo, Sining Sun, Lei Xie, John H.L. Hansen

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

43 Scopus citations

Abstract

Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.

Original language	English
Pages (from-to)	4010-4014
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2019-September
DOIs	https://doi.org/10.21437/Interspeech.2019-2983
State	Published - 2019
Event	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria Duration: 15 Sep 2019 → 19 Sep 2019

Keywords

Adversarial example
Adversarial regularization
End-to-end robust SV
Fast gradient-sign method (FGSM)
Local distributional smoothness (LDS)

Access to Document

10.21437/Interspeech.2019-2983

Cite this

@article{da4baccc9148430e915961c78a1fe64e,

title = "Adversarial regularization for end-to-end robust speaker verification",

abstract = "Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.",

keywords = "Adversarial example, Adversarial regularization, End-to-end robust SV, Fast gradient-sign method (FGSM), Local distributional smoothness (LDS)",

author = "Qing Wang and Pengcheng Guo and Sining Sun and Lei Xie and Hansen, {John H.L.}",

note = "Publisher Copyright: Copyright {\textcopyright} 2019 ISCA; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

year = "2019",

doi = "10.21437/Interspeech.2019-2983",

language = "英语",

volume = "2019-September",

pages = "4010--4014",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Adversarial regularization for end-to-end robust speaker verification

AU - Wang, Qing

AU - Guo, Pengcheng

AU - Sun, Sining

AU - Xie, Lei

AU - Hansen, John H.L.

PY - 2019

Y1 - 2019

N2 - Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.

AB - Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.

KW - Adversarial example

KW - Adversarial regularization

KW - End-to-end robust SV

KW - Fast gradient-sign method (FGSM)

KW - Local distributional smoothness (LDS)

UR - http://www.scopus.com/inward/record.url?scp=85074727570&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-2983

DO - 10.21437/Interspeech.2019-2983

M3 - 会议文章

AN - SCOPUS:85074727570

SN - 2308-457X

VL - 2019-September

SP - 4010

EP - 4014

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Adversarial regularization for end-to-end robust speaker verification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this