End-to-End Voice Conversion with Information Perturbation

Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech. However, current approaches are insufficient to achieve comprehensive source prosody transfer and target speaker timbre preservation in the converted speech, and the quality of the converted speech is also unsatisfied due to the mismatch between the acoustic model and the vocoder. In this paper, we leverage the recent advances in information perturbation and propose a fully end-to-end approach to conduct high-quality voice conversion. We first adopt information perturbation to remove speaker-related information in the source speech to disentangle speaker timbre and linguistic content and thus the linguistic information is subsequently modeled by a content encoder. To better transfer the prosody of the source speech to the target, we particularly introduce a speaker-related pitch encoder which can maintain the general pitch pattern of the source speaker while flexibly modifying the pitch intensity of the generated speech. Finally, one-shot voice conversion is set up through continuous speaker space modeling. Experimental results indicate that the proposed end-to-end approach significantly outperforms the state-of-the-art models in terms of intelligibility, naturalness, and speaker similarity.

Original languageEnglish
Title of host publication2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
EditorsKong Aik Lee, Hung-yi Lee, Yanfeng Lu, Minghui Dong
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages91-95
Number of pages5
ISBN (Electronic)9798350397963
DOIs
StatePublished - 2022
Event13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 - Singapore, Singapore
Duration: 11 Dec 202214 Dec 2022

Publication series

Name2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

Conference

Conference13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
Country/TerritorySingapore
CitySingapore
Period11/12/2214/12/22

Keywords

  • any-to-any
  • end-to-end
  • voice conversion

Fingerprint

Dive into the research topics of 'End-to-End Voice Conversion with Information Perturbation'. Together they form a unique fingerprint.

Cite this