Accent-VITS: Accent Transfer for End-to-End TTS

Linhan Ma, Yongmao Zhang, Xinfa Zhu, Yi Lei, Ziqian Ning, Pengcheng Zhu, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)

摘要

Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker’s voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VITS-based [7] end-to-end accent transfer model named Accent-VITS. Based on the main structure of VITS, Accent-VITS makes substantial improvements to enable effective and stable accent transfer. We leverage a hierarchical CVAE structure to model accent pronunciation information and acoustic features, respectively, using bottleneck features and mel spectrums as constraints. Moreover, the text-to-wave mapping in VITS is decomposed into text-to-accent and accent-to-wave mappings in Accent-VITS. In this way, the disentanglement of accent and speaker timbre becomes be more stable and effective. Experiments on multi-accent and Mandarin datasets show that Accent-VITS achieves higher speaker similarity, accent similarity and speech naturalness as compared with a strong baseline (Demos: https://anonymous-accentvits.github.io/AccentVITS/).

源语言英语
主期刊名Man-Machine Speech Communication - 18th National Conference, NCMMSC 2023, Proceedings
编辑Jia Jia, Zhenhua Ling, Xie Chen, Ya Li, Zixing Zhang
出版商Springer Science and Business Media Deutschland GmbH
203-214
页数12
ISBN(印刷版)9789819706006
DOI
出版状态已出版 - 2024
活动18th National Conference on Man-Machine Speech Communication, NCMMSC 2023 - Suzhou, 中国
期限: 8 12月 202311 12月 2023

出版系列

姓名Communications in Computer and Information Science
2006
ISSN(印刷版)1865-0929
ISSN(电子版)1865-0937

会议

会议18th National Conference on Man-Machine Speech Communication, NCMMSC 2023
国家/地区中国
Suzhou
时期8/12/2311/12/23

引用此