Improving mandarin end-To-end speech synthesis by self-Attention and learnable gaussian bias

Fengyu Yang, Shan Yang, Pengcheng Zhu, Pengju Yan, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

15 引用 (Scopus)

摘要

Compared to conventional speech synthesis, end-To-end speech synthesis has achieved much better naturalness with more simplified system building pipeline. End-To-end framework can generate natural speech directly from characters for English. But for other languages like Chinese, recent studies have indicated that extra engineering features are still needed for model robustness and naturalness, e.g, word boundaries and prosody boundaries, which makes the front-end pipeline as complicated as the traditional approach. To maintain the naturalness of generated speech and discard language-specific expertise as much as possible, in Mandarin TTS, we introduce a novel self-Attention based encoder with learnable Gaussian bias in Tacotron. We evaluate different systems with and without complex prosody information and results show that the proposed approach has the ability to generate stable and natural speech with minimum language-dependent front-end modules.

源语言英语
主期刊名2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
208-213
页数6
ISBN(电子版)9781728103068
DOI
出版状态已出版 - 12月 2019
活动2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Singapore, 新加坡
期限: 15 12月 201918 12月 2019

出版系列

姓名2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

会议

会议2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
国家/地区新加坡
Singapore
时期15/12/1918/12/19

指纹

探究 'Improving mandarin end-To-end speech synthesis by self-Attention and learnable gaussian bias' 的科研主题。它们共同构成独一无二的指纹。

引用此