ONE-SHOT VOICE CONVERSION FOR STYLE TRANSFER BASED ON SPEAKER ADAPTATION

Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi

科研成果: 书/报告/会议事项章节会议稿件同行评审

10 引用 (Scopus)

摘要

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness. In this paper, we build on the recognition-synthesis framework and propose a one-shot voice conversion approach for style transfer based on speaker adaptation. First, a speaker normalization module is adopted to remove speaker-related information in bottleneck features extracted by ASR. Second, we adopt weight regularization in the adaptation process to prevent over-fitting caused by using only one utterance from target speaker as training data. Finally, to comprehensively decouple the speech factors, i.e., content, speaker, style, and transfer source style to the target, a prosody module is used to extract prosody representation. Experiments show that our approach is superior to the state-of-the-art one-shot VC systems in terms of style and speaker similarity; additionally, our approach also maintains good speech quality.

源语言英语
主期刊名2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
6792-6796
页数5
ISBN(电子版)9781665405409
DOI
出版状态已出版 - 2022
活动2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 - Hybrid, 新加坡
期限: 22 5月 202227 5月 2022

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2022-May
ISSN(印刷版)1520-6149

会议

会议2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
国家/地区新加坡
Hybrid
时期22/05/2227/05/22

指纹

探究 'ONE-SHOT VOICE CONVERSION FOR STYLE TRANSFER BASED ON SPEAKER ADAPTATION' 的科研主题。它们共同构成独一无二的指纹。

引用此