Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix

Jixun Yao; Qing Wang; Pengcheng Guo; Ziqian Ning; Lei Xie

doi:10.1109/TASLP.2024.3407600

Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix

Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Speaker anonymization is an effective privacy protection solution that aims to conceal speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-trained automatic speaker verification (ASV) model to represent speaker identity, which is then averaged or modified for anonymization. However, these systems suffer from deterioration in the naturalness of anonymized speech, degradation in speaker distinctiveness, and severe privacy leakage against powerful attackers. To address these issues and especially generate more natural and distinctive anonymized speech, we propose a novel speaker anonymization approach that models a matrix related to speaker identity and transforms it into an anonymized singular value transformation-assisted matrix to conceal the original speaker identity. Our approach extracts frame-level speaker vectors from a pre-trained ASV model and employs an attention mechanism to create a speaker-score matrix and speaker-related tokens. Notably, the speaker-score matrix acts as the weight for the corresponding speaker-related token, representing the speaker's identity. The singular value transformation-assisted matrix is generated through the recomposition of the decomposed orthonormal eigenvectors matrix and non-linear transformed singular through Singular Value Decomposition (SVD). This process prevents the degradation of speaker distinctiveness caused by the introduction of other speakers' identity information. By multiplying the singular value transformation-assisted matrix and speaker-related tokens, we generate the anonymized speaker identity representation, thereby producing anonymized speech that is both natural and distinctive. Experiments on VoicePrivacy Challenge datasets demonstrate the effectiveness of our approach in protecting speaker privacy under all attack scenarios while maintaining speech naturalness and distinctiveness.

Original language	English
Pages (from-to)	2944-2956
Number of pages	13
Journal	IEEE/ACM Transactions on Audio Speech and Language Processing
Volume	32
DOIs	https://doi.org/10.1109/TASLP.2024.3407600
State	Published - 2024

Keywords

Speaker anonymization
VoicePrivacy challenge
privacy protection
singular value decomposition (SVD)

Access to Document

10.1109/TASLP.2024.3407600

Cite this

@article{801ae7d1923c4a20951134b6a654c305,

title = "Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix",

abstract = "Speaker anonymization is an effective privacy protection solution that aims to conceal speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-trained automatic speaker verification (ASV) model to represent speaker identity, which is then averaged or modified for anonymization. However, these systems suffer from deterioration in the naturalness of anonymized speech, degradation in speaker distinctiveness, and severe privacy leakage against powerful attackers. To address these issues and especially generate more natural and distinctive anonymized speech, we propose a novel speaker anonymization approach that models a matrix related to speaker identity and transforms it into an anonymized singular value transformation-assisted matrix to conceal the original speaker identity. Our approach extracts frame-level speaker vectors from a pre-trained ASV model and employs an attention mechanism to create a speaker-score matrix and speaker-related tokens. Notably, the speaker-score matrix acts as the weight for the corresponding speaker-related token, representing the speaker's identity. The singular value transformation-assisted matrix is generated through the recomposition of the decomposed orthonormal eigenvectors matrix and non-linear transformed singular through Singular Value Decomposition (SVD). This process prevents the degradation of speaker distinctiveness caused by the introduction of other speakers' identity information. By multiplying the singular value transformation-assisted matrix and speaker-related tokens, we generate the anonymized speaker identity representation, thereby producing anonymized speech that is both natural and distinctive. Experiments on VoicePrivacy Challenge datasets demonstrate the effectiveness of our approach in protecting speaker privacy under all attack scenarios while maintaining speech naturalness and distinctiveness.",

keywords = "Speaker anonymization, VoicePrivacy challenge, privacy protection, singular value decomposition (SVD)",

author = "Jixun Yao and Qing Wang and Pengcheng Guo and Ziqian Ning and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2024",

doi = "10.1109/TASLP.2024.3407600",

language = "英语",

volume = "32",

pages = "2944--2956",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "IEEE Advancing Technology for Humanity",

}

TY - JOUR

T1 - Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix

AU - Yao, Jixun

AU - Wang, Qing

AU - Guo, Pengcheng

AU - Ning, Ziqian

AU - Xie, Lei

PY - 2024

Y1 - 2024

N2 - Speaker anonymization is an effective privacy protection solution that aims to conceal speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-trained automatic speaker verification (ASV) model to represent speaker identity, which is then averaged or modified for anonymization. However, these systems suffer from deterioration in the naturalness of anonymized speech, degradation in speaker distinctiveness, and severe privacy leakage against powerful attackers. To address these issues and especially generate more natural and distinctive anonymized speech, we propose a novel speaker anonymization approach that models a matrix related to speaker identity and transforms it into an anonymized singular value transformation-assisted matrix to conceal the original speaker identity. Our approach extracts frame-level speaker vectors from a pre-trained ASV model and employs an attention mechanism to create a speaker-score matrix and speaker-related tokens. Notably, the speaker-score matrix acts as the weight for the corresponding speaker-related token, representing the speaker's identity. The singular value transformation-assisted matrix is generated through the recomposition of the decomposed orthonormal eigenvectors matrix and non-linear transformed singular through Singular Value Decomposition (SVD). This process prevents the degradation of speaker distinctiveness caused by the introduction of other speakers' identity information. By multiplying the singular value transformation-assisted matrix and speaker-related tokens, we generate the anonymized speaker identity representation, thereby producing anonymized speech that is both natural and distinctive. Experiments on VoicePrivacy Challenge datasets demonstrate the effectiveness of our approach in protecting speaker privacy under all attack scenarios while maintaining speech naturalness and distinctiveness.

AB - Speaker anonymization is an effective privacy protection solution that aims to conceal speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-trained automatic speaker verification (ASV) model to represent speaker identity, which is then averaged or modified for anonymization. However, these systems suffer from deterioration in the naturalness of anonymized speech, degradation in speaker distinctiveness, and severe privacy leakage against powerful attackers. To address these issues and especially generate more natural and distinctive anonymized speech, we propose a novel speaker anonymization approach that models a matrix related to speaker identity and transforms it into an anonymized singular value transformation-assisted matrix to conceal the original speaker identity. Our approach extracts frame-level speaker vectors from a pre-trained ASV model and employs an attention mechanism to create a speaker-score matrix and speaker-related tokens. Notably, the speaker-score matrix acts as the weight for the corresponding speaker-related token, representing the speaker's identity. The singular value transformation-assisted matrix is generated through the recomposition of the decomposed orthonormal eigenvectors matrix and non-linear transformed singular through Singular Value Decomposition (SVD). This process prevents the degradation of speaker distinctiveness caused by the introduction of other speakers' identity information. By multiplying the singular value transformation-assisted matrix and speaker-related tokens, we generate the anonymized speaker identity representation, thereby producing anonymized speech that is both natural and distinctive. Experiments on VoicePrivacy Challenge datasets demonstrate the effectiveness of our approach in protecting speaker privacy under all attack scenarios while maintaining speech naturalness and distinctiveness.

KW - Speaker anonymization

KW - VoicePrivacy challenge

KW - privacy protection

KW - singular value decomposition (SVD)

UR - http://www.scopus.com/inward/record.url?scp=85196614913&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2024.3407600

DO - 10.1109/TASLP.2024.3407600

M3 - 文章

AN - SCOPUS:85196614913

SN - 2329-9290

VL - 32

SP - 2944

EP - 2956

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

ER -

Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this