End-to-End Voice Conversion with Information Perturbation

Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech. However, current approaches are insufficient to achieve comprehensive source prosody transfer and target speaker timbre preservation in the converted speech, and the quality of the converted speech is also unsatisfied due to the mismatch between the acoustic model and the vocoder. In this paper, we leverage the recent advances in information perturbation and propose a fully end-to-end approach to conduct high-quality voice conversion. We first adopt information perturbation to remove speaker-related information in the source speech to disentangle speaker timbre and linguistic content and thus the linguistic information is subsequently modeled by a content encoder. To better transfer the prosody of the source speech to the target, we particularly introduce a speaker-related pitch encoder which can maintain the general pitch pattern of the source speaker while flexibly modifying the pitch intensity of the generated speech. Finally, one-shot voice conversion is set up through continuous speaker space modeling. Experimental results indicate that the proposed end-to-end approach significantly outperforms the state-of-the-art models in terms of intelligibility, naturalness, and speaker similarity.

源语言英语
主期刊名2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
编辑Kong Aik Lee, Hung-yi Lee, Yanfeng Lu, Minghui Dong
出版商Institute of Electrical and Electronics Engineers Inc.
91-95
页数5
ISBN(电子版)9798350397963
DOI
出版状态已出版 - 2022
活动13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 - Singapore, 新加坡
期限: 11 12月 202214 12月 2022

出版系列

姓名2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

会议

会议13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
国家/地区新加坡
Singapore
时期11/12/2214/12/22

指纹

探究 'End-to-End Voice Conversion with Information Perturbation' 的科研主题。它们共同构成独一无二的指纹。

引用此