Cosine metric learning for speaker verification in the i-vector space

Zhongxin Bai; Xiao Lei Zhang; Jingdong Chen

doi:10.21437/Interspeech.2018-1593

Cosine metric learning for speaker verification in the i-vector space

Zhongxin Bai, Xiao Lei Zhang, Jingdong Chen

航海学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 会议文章 › 同行评审

4 引用（Scopus）

摘要

It is known that the equal-error-rate (EER) performance of a speaker verification system is determined by the overlap region of the decision scores of true and imposter trials. Also, the cosine similarity scores of the true or imposter trials produced by the state-of-the-art i-vector front-end approximate to a Gaussian distribution, and the overlap region of the two classes of trials depends mainly on their between-class distance. Motivated by the above facts, this paper presents a cosine similarity learning (CML) framework for speaker verification, which combines classical compensation techniques and the cosine similarity scoring for improving the EER performance. CML minimizes the overlap region by enlarging the between-class distance while introducing a regularization term to control the with-in class variance, which is initialized by a traditional channel compensation technique such as linear discriminant analysis. Experiments are carried out to compare the proposed CML framework with several traditional channel compensation baselines on the NIST speaker recognition evaluation data sets. The results show that CML outperforms all the studied initialization compensation techniques.

源语言	英语
页（从-至）	1126-1130
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2018-September
DOI	https://doi.org/10.21437/Interspeech.2018-1593
出版状态	已出版 - 2018
活动	19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, 印度期限: 2 9月 2018 → 6 9月 2018

访问文件

10.21437/Interspeech.2018-1593

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{ef1d40ff02124d29b90fe21d76a612bf,

title = "Cosine metric learning for speaker verification in the i-vector space",

abstract = "It is known that the equal-error-rate (EER) performance of a speaker verification system is determined by the overlap region of the decision scores of true and imposter trials. Also, the cosine similarity scores of the true or imposter trials produced by the state-of-the-art i-vector front-end approximate to a Gaussian distribution, and the overlap region of the two classes of trials depends mainly on their between-class distance. Motivated by the above facts, this paper presents a cosine similarity learning (CML) framework for speaker verification, which combines classical compensation techniques and the cosine similarity scoring for improving the EER performance. CML minimizes the overlap region by enlarging the between-class distance while introducing a regularization term to control the with-in class variance, which is initialized by a traditional channel compensation technique such as linear discriminant analysis. Experiments are carried out to compare the proposed CML framework with several traditional channel compensation baselines on the NIST speaker recognition evaluation data sets. The results show that CML outperforms all the studied initialization compensation techniques.",

keywords = "Channel, Cosine metric learning, Session compensation, Speaker verification",

author = "Zhongxin Bai and Zhang, {Xiao Lei} and Jingdong Chen",

note = "Publisher Copyright: {\textcopyright} 2018 International Speech Communication Association. All rights reserved.; 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 ; Conference date: 02-09-2018 Through 06-09-2018",

year = "2018",

doi = "10.21437/Interspeech.2018-1593",

language = "英语",

volume = "2018-September",

pages = "1126--1130",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Cosine metric learning for speaker verification in the i-vector space

AU - Bai, Zhongxin

AU - Zhang, Xiao Lei

AU - Chen, Jingdong

PY - 2018

Y1 - 2018

N2 - It is known that the equal-error-rate (EER) performance of a speaker verification system is determined by the overlap region of the decision scores of true and imposter trials. Also, the cosine similarity scores of the true or imposter trials produced by the state-of-the-art i-vector front-end approximate to a Gaussian distribution, and the overlap region of the two classes of trials depends mainly on their between-class distance. Motivated by the above facts, this paper presents a cosine similarity learning (CML) framework for speaker verification, which combines classical compensation techniques and the cosine similarity scoring for improving the EER performance. CML minimizes the overlap region by enlarging the between-class distance while introducing a regularization term to control the with-in class variance, which is initialized by a traditional channel compensation technique such as linear discriminant analysis. Experiments are carried out to compare the proposed CML framework with several traditional channel compensation baselines on the NIST speaker recognition evaluation data sets. The results show that CML outperforms all the studied initialization compensation techniques.

AB - It is known that the equal-error-rate (EER) performance of a speaker verification system is determined by the overlap region of the decision scores of true and imposter trials. Also, the cosine similarity scores of the true or imposter trials produced by the state-of-the-art i-vector front-end approximate to a Gaussian distribution, and the overlap region of the two classes of trials depends mainly on their between-class distance. Motivated by the above facts, this paper presents a cosine similarity learning (CML) framework for speaker verification, which combines classical compensation techniques and the cosine similarity scoring for improving the EER performance. CML minimizes the overlap region by enlarging the between-class distance while introducing a regularization term to control the with-in class variance, which is initialized by a traditional channel compensation technique such as linear discriminant analysis. Experiments are carried out to compare the proposed CML framework with several traditional channel compensation baselines on the NIST speaker recognition evaluation data sets. The results show that CML outperforms all the studied initialization compensation techniques.

KW - Channel

KW - Cosine metric learning

KW - Session compensation

KW - Speaker verification

UR - http://www.scopus.com/inward/record.url?scp=85054957498&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-1593

DO - 10.21437/Interspeech.2018-1593

M3 - 会议文章

AN - SCOPUS:85054957498

SN - 2308-457X

VL - 2018-September

SP - 1126

EP - 1130

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018

Y2 - 2 September 2018 through 6 September 2018

ER -

Cosine metric learning for speaker verification in the i-vector space

摘要

访问文件

其它文件与链接

指纹

引用此