A comparison of expressive speech synthesis approaches based on neural network

Liumeng Xue, Xiaolian Zhu, Xiaochun An, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

5 引用 (Scopus)

摘要

Adaptability and controllability in changing speaking styles and speaker characteristics are the advantages of deep neural networks (DNNs) based statistical parametric speech synthesis (SPSS). This paper presents a comprehensive study on the use of DNNs for expressive speech synthesis with a small set of emotional speech data. Specifically, we study three typical model adaptation approaches: (1) retraining a neural model by emotion-specific data (retrain), (2) augmenting the network input using emotion-specific codes (code) and (3) using emotion-dependent output layers with shared hidden layers (multi-head). Long-short term memory (LSTM) networks are used as the acoustic models. Objective and subjective evaluations have demonstrated that the multi-head approach consistently outperforms the other two approaches with more natural emotion delivered in the synthesized speech.

源语言英语
主期刊名ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018
出版商Association for Computing Machinery, Inc
15-20
页数6
ISBN(电子版)9781450359856
DOI
出版状态已出版 - 19 10月 2018
活动Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018 - Seoul, 韩国
期限: 26 10月 2018 → …

出版系列

姓名ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018

会议

会议Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018
国家/地区韩国
Seoul
时期26/10/18 → …

指纹

探究 'A comparison of expressive speech synthesis approaches based on neural network' 的科研主题。它们共同构成独一无二的指纹。

引用此