TY - JOUR
T1 - Symmetric Saliency-Based Adversarial Attack to Speaker Identification
AU - Yao, Jiadi
AU - Chen, Xing
AU - Zhang, Xiao Lei
AU - Zhang, Wei Qiang
AU - Yang, Kunde
N1 - Publisher Copyright:
© 1994-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge. To address this issue, in this letter, we propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED), to generate adversarial voice examples to speaker identification. It contains two novel components. First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system, so as to make the attacker focus on generating artificial noise to the important samples. It also proposes an angular loss function to push the speaker embedding far away from the source speaker. Our experimental results demonstrate that the proposed SSED yields the state-of-the-art performance, i.e. over 97% targeted attack success rate and a signal-to-noise level of over 39 dB on both the open-set and close-set speaker identification tasks, with a low computational cost.
AB - Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge. To address this issue, in this letter, we propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED), to generate adversarial voice examples to speaker identification. It contains two novel components. First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system, so as to make the attacker focus on generating artificial noise to the important samples. It also proposes an angular loss function to push the speaker embedding far away from the source speaker. Our experimental results demonstrate that the proposed SSED yields the state-of-the-art performance, i.e. over 97% targeted attack success rate and a signal-to-noise level of over 39 dB on both the open-set and close-set speaker identification tasks, with a low computational cost.
KW - Adversarial attack
KW - angular loss
KW - saliency map decoder
KW - speaker identification
UR - http://www.scopus.com/inward/record.url?scp=85147281225&partnerID=8YFLogxK
U2 - 10.1109/LSP.2023.3236509
DO - 10.1109/LSP.2023.3236509
M3 - 文章
AN - SCOPUS:85147281225
SN - 1070-9908
VL - 30
SP - 1
EP - 5
JO - IEEE Signal Processing Letters
JF - IEEE Signal Processing Letters
ER -