Timbre-Reserved Adversarial Attack in Speaker Identification

Qing Wang, Jixun Yao, Li Zhang, Pengcheng Guo, Lei Xie

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

As a type of biometric identification, speaker identification (SID) systems face various attacks. Spoofing attacks imitate target speakers' timbre, while adversarial attacks confuse SID systems with well-designed perturbations. Spoofing mimics victim timbre but fails to exploit SID model vulnerabilities, potentially not achieving the attacker.s goal. On the other hand, adversarial attacks can lead SID to a decision but may not meet specific text or speaker timbre requirements for certain attack scenarios. In this study, we propose a timbre-reserved adversarial attack in speaker identification to leverage SID model vulnerabilities while preserving the target speaker.s timbre. We generate timbre-reserved adversarial audio by adding an adversarial constraint during different training stages of the voice conversion (VC) model. This constraint utilizes the target speaker label to optimize adversarial perturbations in VC model representations and is implemented through a speaker classifier integrated into VC model training. This adversarial constraint helps control the VC model to generate speaker-wised audio. Ultimately, the VC model.s inference produces ideal timbre-reserved adversarial audio capable of deceiving SID system. Experimental results on the Audio deepfake detection (ADD) challenge dataset demonstrate that our method significantly improves attack success rate compared to the vanilla VC model, without introducing additional adversarial noise to the attack speech. Objective and subjective evaluations confirm the superior quality of fake audio generated by our approach compared to directly adding adversarial perturbation to VC-generated audio. Additionally, our analysis indicates that our generated adversarial fake audio meets the specified text and target speaker timbre requirements of the attacker.

Original languageEnglish
Pages (from-to)3848-3858
Number of pages11
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume31
DOIs
StatePublished - 2023

Keywords

  • Adversarial attack
  • speaker identification
  • timbre-reserved
  • voice conversion

Fingerprint

Dive into the research topics of 'Timbre-Reserved Adversarial Attack in Speaker Identification'. Together they form a unique fingerprint.

Cite this