TY - GEN
T1 - Inaudible adversarial perturbations for targeted attack in speaker recognition
AU - Wang, Qing
AU - Guo, Pengcheng
AU - Xie, Lei
N1 - Publisher Copyright:
Copyright © 2020 ISCA
PY - 2020
Y1 - 2020
N2 - Speaker recognition is a popular topic in biometric authentication and many deep learning approaches have achieved extraordinary performances. However, it has been shown in both image and speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we aim to exploit this weakness to perform targeted adversarial attacks against the x-vector based speaker recognition system. We propose to generate inaudible adversarial perturbations based on the psychoacoustic principle of frequency masking, achieving targeted white-box attacks to speaker recognition system. Specifically, we constrict the perturbation under the masking threshold of original audio, instead of using a common lp norm to measure the perturbations. Experiments on Aishell-1 corpus show that our approach yields up to 98.5% attack success rate to arbitrary gender speaker targets, while retaining indistinguishable attribute to listeners. Furthermore, we also achieve an effective speaker attack when applying the proposed approach to a completely irrelevant waveform, such as music.
AB - Speaker recognition is a popular topic in biometric authentication and many deep learning approaches have achieved extraordinary performances. However, it has been shown in both image and speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we aim to exploit this weakness to perform targeted adversarial attacks against the x-vector based speaker recognition system. We propose to generate inaudible adversarial perturbations based on the psychoacoustic principle of frequency masking, achieving targeted white-box attacks to speaker recognition system. Specifically, we constrict the perturbation under the masking threshold of original audio, instead of using a common lp norm to measure the perturbations. Experiments on Aishell-1 corpus show that our approach yields up to 98.5% attack success rate to arbitrary gender speaker targets, while retaining indistinguishable attribute to listeners. Furthermore, we also achieve an effective speaker attack when applying the proposed approach to a completely irrelevant waveform, such as music.
KW - Adversarial example
KW - Inaudible
KW - Speaker recognition
KW - Targeted adversarial attack
UR - http://www.scopus.com/inward/record.url?scp=85098192009&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-1955
DO - 10.21437/Interspeech.2020-1955
M3 - 会议稿件
AN - SCOPUS:85098192009
SN - 9781713820697
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 4228
EP - 4232
BT - Interspeech 2020
PB - International Speech Communication Association
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -