Abstract
Being a form of biometric identification, the security of the speaker identification (SID) system is of utmost importance. To better understand the robustness of SID systems, we aim to perform more realistic attacks in SID, which are challenging for humans and machines to detect. In this study, we propose DiffAttack, a novel timbre-reserved adversarial attack approach, that exploits the capability of a diffusion-based voice conversion (DiffVC) model to generate adversarial fake audio with distinct target speaker attribution. By introducing adversarial constraints into the diffusion-based voice conversion model's generative process, we aim to craft fake samples that effectively mislead target models while preserving the speaker-wised characteristics. Specifically, inspired by the utilization of randomly sampled Gaussian noise in conventional adversarial attack and diffusion processes, we incorporate adversarial constraints into the reverse diffusion process. As a result, these adversarial constraints subtly guide the reverse diffusion process toward aligning with the target speaker distribution. Our experiments on the LibriTTS dataset indicate that our proposed DiffAttack significantly improves the attack success rate compared to vanilla DiffVC or other methods. Furthermore, objective and subjective evaluations demonstrate that introducing adversarial constraints does not compromise the speech quality generated by the DiffVC model.
| Original language | English |
|---|---|
| Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India Duration: 6 Apr 2025 → 11 Apr 2025 |
Keywords
- adversarial attack
- diffusion model
- speaker identification
- voice conversion
Fingerprint
Dive into the research topics of 'DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver