End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy

Zhongxin Bai, Jianyu Wang, Xiao Lei Zhang, Jingdong Chen

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

End-to-end speaker verification achieves the verification through estimating directly the similarity score between a pair of utterances, which is formulated as a binary (i.e., target versus non-target) classification problem. Unlike the stage-wise method, an end-to-end verification approach optimizes the evaluation metrics directly and its output layer is parameter-free, which can save great computing and memory resources. However, it faces two important difficulties. The first one is how to deal with severely imbalanced trials, i.e., the number of target trials is much smaller than that of nontarget trials, and the other is about how to handle easy trials that do not help improve the model in training. To circumvent these two issues, we propose in this paper a binary cross-entropy (BCE) type of loss function and present a method to train the deep neural network (DNN) models based on the proposed loss function for end-to-end speaker verification. The training process employs a bipartite ranking method to deal with the trial imbalance problem and a curriculum learning method to help improve both the training stability and performance of the model by selecting non-target trials from easy to hard ones gradually along the convergence process. Since the training process employs bipartite ranking and curriculum learning and the loss function is of the generalized BCE form, we name the new approach curriculum bipartite ranking weighted binary cross-entropy (CBRW-BCE). Experimental results show that the model trained with CBRW-BCE not only achieves the state-of-the-art performance but is also well calibrated.

Original languageEnglish
Pages (from-to)1330-1344
Number of pages15
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume30
DOIs
StatePublished - 2022

Keywords

  • bipartite ranking
  • calibration
  • curriculum learning
  • End-to-end
  • metric learning

Fingerprint

Dive into the research topics of 'End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy'. Together they form a unique fingerprint.

Cite this