Optimizing Voice Conversion Network with Cycle Consistency Loss of Speaker Identity

  • Hongqiang Du
  • , Xiaohai Tian
  • , Lei Xie
  • , Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level. While the proposed training scheme is applicable to any voice conversion networks, we formulate the study under the average model voice conversion framework in this paper. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus confirm that the proposed method outperforms baseline methods in terms of speaker similarity.

Original languageEnglish
Title of host publication2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages507-513
Number of pages7
ISBN (Electronic)9781728170664
DOIs
StatePublished - 19 Jan 2021
Event2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Virtual, Online, China
Duration: 19 Jan 202122 Jan 2021

Publication series

Name2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

Conference

Conference2021 IEEE Spoken Language Technology Workshop, SLT 2021
Country/TerritoryChina
CityVirtual, Online
Period19/01/2122/01/21

Keywords

  • Voice conversion
  • cycle consistency loss
  • speaker embedding

Fingerprint

Dive into the research topics of 'Optimizing Voice Conversion Network with Cycle Consistency Loss of Speaker Identity'. Together they form a unique fingerprint.

Cite this