Cosine metric learning based speaker verification

Zhongxin Bai; Xiao Lei Zhang; Jingdong Chen

doi:10.1016/j.specom.2020.02.003

Cosine metric learning based speaker verification

Zhongxin Bai, Xiao Lei Zhang, Jingdong Chen

School of Marine Science and Technology

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

8 Scopus citations

Abstract

The performance of speaker verification depends on the overlap region of the decision scores of true and imposter trials. Motivated by the fact that the overlap region can be reduced by maximizing the between-class distance while minimizing the within-class variance of the trials, we present in this paper two cosine metric learning (CML) back-end algorithms. The first one, named m-CML, aims to enlarge the between-class distance with a regularization term to control the within-class variance. The second one, named v-CML, attempts to reduce the within-class variance with a regularization term to prevent the between-class distance from getting smaller. The regularization terms in the CML methods can be initialized by a traditional channel compensation method, e.g., the linear discriminant analysis. These two algorithms are combined with front-end processing for speaker verification. To validate their effectiveness, m-CML is combined with an i-vector front-end since it is good at enlarging the between-class distance of Gaussian score distributions while v-CML is combined with a d-vector or x-vector front-end as it is able to reduce the within-class variance of heavy-tailed score distributions significantly. Experimental results on the NIST and SITW speaker recognition evaluation corpora show that the proposed algorithms outperform their initialization channel compensation methods, and are competitive to the probabilistic linear discriminant analysis back-end in terms of performance. For comparison, we also applied the m-CML and v-CML methods to the i-vector and x-vector front-ends.

Original language	English
Pages (from-to)	10-20
Number of pages	11
Journal	Speech Communication
Volume	118
DOIs	https://doi.org/10.1016/j.specom.2020.02.003
State	Published - Apr 2020

Keywords

Cosine metric learning
Inter-session variability compensation
Speaker verification

Access to Document

10.1016/j.specom.2020.02.003

Cite this

@article{c3a3dee52fd04c13af4b2a59d37fe3e7,

title = "Cosine metric learning based speaker verification",

abstract = "The performance of speaker verification depends on the overlap region of the decision scores of true and imposter trials. Motivated by the fact that the overlap region can be reduced by maximizing the between-class distance while minimizing the within-class variance of the trials, we present in this paper two cosine metric learning (CML) back-end algorithms. The first one, named m-CML, aims to enlarge the between-class distance with a regularization term to control the within-class variance. The second one, named v-CML, attempts to reduce the within-class variance with a regularization term to prevent the between-class distance from getting smaller. The regularization terms in the CML methods can be initialized by a traditional channel compensation method, e.g., the linear discriminant analysis. These two algorithms are combined with front-end processing for speaker verification. To validate their effectiveness, m-CML is combined with an i-vector front-end since it is good at enlarging the between-class distance of Gaussian score distributions while v-CML is combined with a d-vector or x-vector front-end as it is able to reduce the within-class variance of heavy-tailed score distributions significantly. Experimental results on the NIST and SITW speaker recognition evaluation corpora show that the proposed algorithms outperform their initialization channel compensation methods, and are competitive to the probabilistic linear discriminant analysis back-end in terms of performance. For comparison, we also applied the m-CML and v-CML methods to the i-vector and x-vector front-ends.",

keywords = "Cosine metric learning, Inter-session variability compensation, Speaker verification",

author = "Zhongxin Bai and Zhang, {Xiao Lei} and Jingdong Chen",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2020",

month = apr,

doi = "10.1016/j.specom.2020.02.003",

language = "英语",

volume = "118",

pages = "10--20",

journal = "Speech Communication",

issn = "0167-6393",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Cosine metric learning based speaker verification

AU - Bai, Zhongxin

AU - Zhang, Xiao Lei

AU - Chen, Jingdong

PY - 2020/4

Y1 - 2020/4

N2 - The performance of speaker verification depends on the overlap region of the decision scores of true and imposter trials. Motivated by the fact that the overlap region can be reduced by maximizing the between-class distance while minimizing the within-class variance of the trials, we present in this paper two cosine metric learning (CML) back-end algorithms. The first one, named m-CML, aims to enlarge the between-class distance with a regularization term to control the within-class variance. The second one, named v-CML, attempts to reduce the within-class variance with a regularization term to prevent the between-class distance from getting smaller. The regularization terms in the CML methods can be initialized by a traditional channel compensation method, e.g., the linear discriminant analysis. These two algorithms are combined with front-end processing for speaker verification. To validate their effectiveness, m-CML is combined with an i-vector front-end since it is good at enlarging the between-class distance of Gaussian score distributions while v-CML is combined with a d-vector or x-vector front-end as it is able to reduce the within-class variance of heavy-tailed score distributions significantly. Experimental results on the NIST and SITW speaker recognition evaluation corpora show that the proposed algorithms outperform their initialization channel compensation methods, and are competitive to the probabilistic linear discriminant analysis back-end in terms of performance. For comparison, we also applied the m-CML and v-CML methods to the i-vector and x-vector front-ends.

AB - The performance of speaker verification depends on the overlap region of the decision scores of true and imposter trials. Motivated by the fact that the overlap region can be reduced by maximizing the between-class distance while minimizing the within-class variance of the trials, we present in this paper two cosine metric learning (CML) back-end algorithms. The first one, named m-CML, aims to enlarge the between-class distance with a regularization term to control the within-class variance. The second one, named v-CML, attempts to reduce the within-class variance with a regularization term to prevent the between-class distance from getting smaller. The regularization terms in the CML methods can be initialized by a traditional channel compensation method, e.g., the linear discriminant analysis. These two algorithms are combined with front-end processing for speaker verification. To validate their effectiveness, m-CML is combined with an i-vector front-end since it is good at enlarging the between-class distance of Gaussian score distributions while v-CML is combined with a d-vector or x-vector front-end as it is able to reduce the within-class variance of heavy-tailed score distributions significantly. Experimental results on the NIST and SITW speaker recognition evaluation corpora show that the proposed algorithms outperform their initialization channel compensation methods, and are competitive to the probabilistic linear discriminant analysis back-end in terms of performance. For comparison, we also applied the m-CML and v-CML methods to the i-vector and x-vector front-ends.

KW - Cosine metric learning

KW - Inter-session variability compensation

KW - Speaker verification

UR - http://www.scopus.com/inward/record.url?scp=85080055012&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2020.02.003

DO - 10.1016/j.specom.2020.02.003

M3 - 文章

AN - SCOPUS:85080055012

SN - 0167-6393

VL - 118

SP - 10

EP - 20

JO - Speech Communication

JF - Speech Communication

ER -

Cosine metric learning based speaker verification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this