Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework

Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

35 引用 (Scopus)

摘要

In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN). In particular, we propose a novel architecture combining the traditional acoustic loss function and the GAN's discriminative loss under a multi-task learning (MTL) framework. The mean squared error (MSE) is usually used to estimate the parameters of deep neural networks, which only considers the numerical difference between the raw audio and the synthesized one. To mitigate this problem, we introduce the GAN as a second task to determine if the input is a natural speech with specific conditions. In this MTL framework, the MSE optimization improves the stability of GAN, and at the same time GAN produces samples with a distribution closer to natural speech. Listening tests show that the multi-task architecture can generate more natural speech that satisfies human perception than the conventional methods.

源语言英语
主期刊名2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
685-691
页数7
ISBN(电子版)9781509047888
DOI
出版状态已出版 - 2 7月 2017
活动2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Okinawa, 日本
期限: 16 12月 201720 12月 2017

出版系列

姓名2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
2018-January

会议

会议2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017
国家/地区日本
Okinawa
时期16/12/1720/12/17

指纹

探究 'Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework' 的科研主题。它们共同构成独一无二的指纹。

引用此