Symmetric Saliency-Based Adversarial Attack to Speaker Identification

Jiadi Yao; Xing Chen; Xiao Lei Zhang; Wei Qiang Zhang; Kunde Yang

doi:10.1109/LSP.2023.3236509

Symmetric Saliency-Based Adversarial Attack to Speaker Identification

Jiadi Yao, Xing Chen, Xiao Lei Zhang, Wei Qiang Zhang, Kunde Yang

海洋研究院

科研成果: 期刊稿件 › 文章 › 同行评审

12 引用（Scopus）

摘要

Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge. To address this issue, in this letter, we propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED), to generate adversarial voice examples to speaker identification. It contains two novel components. First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system, so as to make the attacker focus on generating artificial noise to the important samples. It also proposes an angular loss function to push the speaker embedding far away from the source speaker. Our experimental results demonstrate that the proposed SSED yields the state-of-the-art performance, i.e. over 97% targeted attack success rate and a signal-to-noise level of over 39 dB on both the open-set and close-set speaker identification tasks, with a low computational cost.

源语言	英语
页（从-至）	1-5
页数	5
期刊	IEEE Signal Processing Letters
卷	30
DOI	https://doi.org/10.1109/LSP.2023.3236509
出版状态	已出版 - 2023

访问文件

10.1109/LSP.2023.3236509

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{de7b890653cc414c8ac3d613e85f5f13,

title = "Symmetric Saliency-Based Adversarial Attack to Speaker Identification",

abstract = "Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge. To address this issue, in this letter, we propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED), to generate adversarial voice examples to speaker identification. It contains two novel components. First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system, so as to make the attacker focus on generating artificial noise to the important samples. It also proposes an angular loss function to push the speaker embedding far away from the source speaker. Our experimental results demonstrate that the proposed SSED yields the state-of-the-art performance, i.e. over 97% targeted attack success rate and a signal-to-noise level of over 39 dB on both the open-set and close-set speaker identification tasks, with a low computational cost.",

keywords = "Adversarial attack, angular loss, saliency map decoder, speaker identification",

author = "Jiadi Yao and Xing Chen and Zhang, {Xiao Lei} and Zhang, {Wei Qiang} and Kunde Yang",

note = "Publisher Copyright: {\textcopyright} 1994-2012 IEEE.",

year = "2023",

doi = "10.1109/LSP.2023.3236509",

language = "英语",

volume = "30",

pages = "1--5",

journal = "IEEE Signal Processing Letters",

issn = "1070-9908",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Symmetric Saliency-Based Adversarial Attack to Speaker Identification

AU - Yao, Jiadi

AU - Chen, Xing

AU - Zhang, Xiao Lei

AU - Zhang, Wei Qiang

AU - Yang, Kunde

PY - 2023

Y1 - 2023

N2 - Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge. To address this issue, in this letter, we propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED), to generate adversarial voice examples to speaker identification. It contains two novel components. First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system, so as to make the attacker focus on generating artificial noise to the important samples. It also proposes an angular loss function to push the speaker embedding far away from the source speaker. Our experimental results demonstrate that the proposed SSED yields the state-of-the-art performance, i.e. over 97% targeted attack success rate and a signal-to-noise level of over 39 dB on both the open-set and close-set speaker identification tasks, with a low computational cost.

AB - Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge. To address this issue, in this letter, we propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED), to generate adversarial voice examples to speaker identification. It contains two novel components. First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system, so as to make the attacker focus on generating artificial noise to the important samples. It also proposes an angular loss function to push the speaker embedding far away from the source speaker. Our experimental results demonstrate that the proposed SSED yields the state-of-the-art performance, i.e. over 97% targeted attack success rate and a signal-to-noise level of over 39 dB on both the open-set and close-set speaker identification tasks, with a low computational cost.

KW - Adversarial attack

KW - angular loss

KW - saliency map decoder

KW - speaker identification

UR - http://www.scopus.com/inward/record.url?scp=85147281225&partnerID=8YFLogxK

U2 - 10.1109/LSP.2023.3236509

DO - 10.1109/LSP.2023.3236509

M3 - 文章

AN - SCOPUS:85147281225

SN - 1070-9908

VL - 30

SP - 1

EP - 5

JO - IEEE Signal Processing Letters

JF - IEEE Signal Processing Letters

ER -

Symmetric Saliency-Based Adversarial Attack to Speaker Identification

摘要

访问文件

其它文件与链接

指纹

引用此