AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents

Yongmao Zhang, Zhichao Wang, Peiji Yang, Hongshen Sun, Zhisheng Wang, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

Learning accent from crowd-sourced data is a feasible way to achieve a target speaker TTS system that can synthesize accent speech. To this end, there are two challenging problems to be solved. First, direct use of the poor acoustic quality crowdsourced data and the target speaker data in accent transfer will apparently lead to synthetic speech with degraded quality. To mitigate this problem, we take a bottleneck feature (BN) based TTS approach, in which TTS is decomposed into a Text-to-BN (T2BN) module to learn accent and a BN-to-Mel (BN2Me1) module to learn speaker timbre, where neural network based BN feature serves as the intermediate representation that are robust to noise interference. Second, direct training T2BN using the crowd-sourced data in the two-stage system will produce accent speech of target speaker with poor prosody. This is because the the crowd-sourced recordings are contributed from the ordinary unprofessional speakers. To tackle this problem, we update the two-stage approach to a novel three-stage approach, where T2BN and BN2Me1 are trained using the high-quality target speaker data and a new BN-to-BN module is plugged in between the two modules to perform accent transfer. To train the BN2BN module, the parallel unaccented and accented BN features are obtained by a proposed data augmentation procedure. Finally the proposed three-stage approach manages to produce accent speech for the target speaker with good prosody, as the prosody pattern is inherited from the professional target speaker and accent transfer is achieved by the BN2BN module at the same time. The proposed approach, named as AccentSpeech, is validated in a Mandarin TTS accent transfer task.

源语言英语
主期刊名2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
编辑Kong Aik Lee, Hung-yi Lee, Yanfeng Lu, Minghui Dong
出版商Institute of Electrical and Electronics Engineers Inc.
76-80
页数5
ISBN(电子版)9798350397963
DOI
出版状态已出版 - 2022
活动13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 - Singapore, 新加坡
期限: 11 12月 202214 12月 2022

出版系列

姓名2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

会议

会议13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
国家/地区新加坡
Singapore
时期11/12/2214/12/22

指纹

探究 'AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents' 的科研主题。它们共同构成独一无二的指纹。

引用此