TY - JOUR
T1 - Adversarial regularization for end-to-end robust speaker verification
AU - Wang, Qing
AU - Guo, Pengcheng
AU - Sun, Sining
AU - Xie, Lei
AU - Hansen, John H.L.
N1 - Publisher Copyright:
Copyright © 2019 ISCA
PY - 2019
Y1 - 2019
N2 - Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.
AB - Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.
KW - Adversarial example
KW - Adversarial regularization
KW - End-to-end robust SV
KW - Fast gradient-sign method (FGSM)
KW - Local distributional smoothness (LDS)
UR - http://www.scopus.com/inward/record.url?scp=85074727570&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2019-2983
DO - 10.21437/Interspeech.2019-2983
M3 - 会议文章
AN - SCOPUS:85074727570
SN - 2308-457X
VL - 2019-September
SP - 4010
EP - 4014
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Y2 - 15 September 2019 through 19 September 2019
ER -