Cosine metric learning based speaker verification

Zhongxin Bai, Xiao Lei Zhang, Jingdong Chen

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

The performance of speaker verification depends on the overlap region of the decision scores of true and imposter trials. Motivated by the fact that the overlap region can be reduced by maximizing the between-class distance while minimizing the within-class variance of the trials, we present in this paper two cosine metric learning (CML) back-end algorithms. The first one, named m-CML, aims to enlarge the between-class distance with a regularization term to control the within-class variance. The second one, named v-CML, attempts to reduce the within-class variance with a regularization term to prevent the between-class distance from getting smaller. The regularization terms in the CML methods can be initialized by a traditional channel compensation method, e.g., the linear discriminant analysis. These two algorithms are combined with front-end processing for speaker verification. To validate their effectiveness, m-CML is combined with an i-vector front-end since it is good at enlarging the between-class distance of Gaussian score distributions while v-CML is combined with a d-vector or x-vector front-end as it is able to reduce the within-class variance of heavy-tailed score distributions significantly. Experimental results on the NIST and SITW speaker recognition evaluation corpora show that the proposed algorithms outperform their initialization channel compensation methods, and are competitive to the probabilistic linear discriminant analysis back-end in terms of performance. For comparison, we also applied the m-CML and v-CML methods to the i-vector and x-vector front-ends.

Original languageEnglish
Pages (from-to)10-20
Number of pages11
JournalSpeech Communication
Volume118
DOIs
StatePublished - Apr 2020

Keywords

  • Cosine metric learning
  • Inter-session variability compensation
  • Speaker verification

Fingerprint

Dive into the research topics of 'Cosine metric learning based speaker verification'. Together they form a unique fingerprint.

Cite this