A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis

Xiaochun An, Yuchao Zhang, Bing Liu, Liumeng Xue, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

This paper proposes a Kullback-Leibler divergence (KLD) based recurrent mixture density network (RMDN) approach for acoustic modeling in emotional statistical parametric speech synthesis (SPSS), which aims at improving model accuracy and emotion naturalness. First, to improve model accuracy, we propose to use RMDN as acoustic model, which combines an LSTM with a mixture density network (MDN). Adding mixture density layer allows us to do multimodal regression as well as to predict variances, thus modeling more accurate probability density functions of acoustic features. Second, we further introduce Kullback-Leibler divergence regularization in model training. Inspired by KLD’s success in acoustic model adaptation, we aim to improve the emotion naturalness by maximizing the distances between the distributions of emotional speech and neutral speech. Objective and subjective evaluations show that the proposed approach improves the prediction accuracy of acoustic features and the naturalness of the synthesized emotional speech.

源语言英语
主期刊名ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018
出版商Association for Computing Machinery, Inc
1-6
页数6
ISBN(电子版)9781450359856
DOI
出版状态已出版 - 19 10月 2018
活动Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018 - Seoul, 韩国
期限: 26 10月 2018 → …

出版系列

姓名ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018

会议

会议Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018
国家/地区韩国
Seoul
时期26/10/18 → …

指纹

探究 'A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis' 的科研主题。它们共同构成独一无二的指纹。

引用此