Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification

Zhongxin Bai; Xiao Lei Zhang; Jingdong Chen

doi:10.1109/ICASSP40776.2020.9053674

Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification

Zhongxin Bai, Xiao Lei Zhang, Jingdong Chen

航海学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

27 引用（Scopus）

摘要

Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios. Its loss functions can be generally categorized into two classes, i.e., verification and identification. The verification loss functions match the pipeline of speaker verification, but their implementations are difficult. Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants. In this paper, we propose a verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification. We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance. Experiments on the Speaker in the Wild (SITW) and NIST SRE 2016 datasets show that the proposed pAUC loss function is highly competitive with the state-of-the-art identification loss functions.

源语言	英语
主期刊名	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	6819-6823
页数	5
ISBN（电子版）	9781509066315
DOI	https://doi.org/10.1109/ICASSP40776.2020.9053674
出版状态	已出版 - 5月 2020
活动	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, 西班牙期限: 4 5月 2020 → 8 5月 2020

出版系列

姓名	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
卷	2020-May
ISSN（印刷版）	1520-6149

会议

会议	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
国家/地区	西班牙
市	Barcelona
时期	4/05/20 → 8/05/20

访问文件

10.1109/ICASSP40776.2020.9053674

其它文件与链接

链接到 Scopus 的出版物

引用此

Bai, Z., Zhang, X. L., & Chen, J. (2020). Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification. 在 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings (页码 6819-6823). 文章 9053674 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2020-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP40776.2020.9053674

Bai, Zhongxin ; Zhang, Xiao Lei ; Chen, Jingdong. / Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification. 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. 页码 6819-6823 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{3a724dac3f6a49be8e4307bb65754de3,

title = "Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification",

abstract = "Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios. Its loss functions can be generally categorized into two classes, i.e., verification and identification. The verification loss functions match the pipeline of speaker verification, but their implementations are difficult. Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants. In this paper, we propose a verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification. We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance. Experiments on the Speaker in the Wild (SITW) and NIST SRE 2016 datasets show that the proposed pAUC loss function is highly competitive with the state-of-the-art identification loss functions.",

keywords = "pAUC optimization, speaker centers, speaker verification, verification loss",

author = "Zhongxin Bai and Zhang, {Xiao Lei} and Jingdong Chen",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020",

year = "2020",

month = may,

doi = "10.1109/ICASSP40776.2020.9053674",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6819--6823",

booktitle = "2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings",

}

Bai, Z, Zhang, XL & Chen, J 2020, Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification. 在 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings., 9053674, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 卷 2020-May, Institute of Electrical and Electronics Engineers Inc., 页码 6819-6823, 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, Barcelona, 西班牙, 4/05/20. https://doi.org/10.1109/ICASSP40776.2020.9053674

Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification. / Bai, Zhongxin; Zhang, Xiao Lei; Chen, Jingdong.
2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. 页码 6819-6823 9053674 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2020-May).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification

AU - Bai, Zhongxin

AU - Zhang, Xiao Lei

AU - Chen, Jingdong

PY - 2020/5

Y1 - 2020/5

N2 - Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios. Its loss functions can be generally categorized into two classes, i.e., verification and identification. The verification loss functions match the pipeline of speaker verification, but their implementations are difficult. Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants. In this paper, we propose a verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification. We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance. Experiments on the Speaker in the Wild (SITW) and NIST SRE 2016 datasets show that the proposed pAUC loss function is highly competitive with the state-of-the-art identification loss functions.

AB - Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios. Its loss functions can be generally categorized into two classes, i.e., verification and identification. The verification loss functions match the pipeline of speaker verification, but their implementations are difficult. Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants. In this paper, we propose a verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification. We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance. Experiments on the Speaker in the Wild (SITW) and NIST SRE 2016 datasets show that the proposed pAUC loss function is highly competitive with the state-of-the-art identification loss functions.

KW - pAUC optimization

KW - speaker centers

KW - speaker verification

KW - verification loss

UR - http://www.scopus.com/inward/record.url?scp=85089240662&partnerID=8YFLogxK

U2 - 10.1109/ICASSP40776.2020.9053674

DO - 10.1109/ICASSP40776.2020.9053674

M3 - 会议稿件

AN - SCOPUS:85089240662

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 6819

EP - 6823

BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020

Y2 - 4 May 2020 through 8 May 2020

ER -

Bai Z, Zhang XL, Chen J. Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification. 在 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2020. 页码 6819-6823. 9053674. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP40776.2020.9053674

Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此