Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features

Chuang Ding, Lei Xie, Jie Yan, Weini Zhang, Yang Liu

科研成果: 书/报告/会议事项章节会议稿件同行评审

38 引用 (Scopus)

摘要

Prosody affects the naturalness and intelligibility of speech. However, automatic prosody prediction from text for Chinese speech synthesis is still a great challenge and the traditional conditional random fields (CRF) based method always heavily relies on feature engineering. In this paper, we propose to use neural networks to predict prosodic boundary labels directly from Chinese characters without any feature engineering. Experimental results show that stacking feed-forward and bidirectional long short-term memory (BLSTM) recurrent network layers achieves superior performance over the CRF-based method. The embedding features learned from raw text further enhance the performance.

源语言英语
主期刊名2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
98-102
页数5
ISBN(电子版)9781479972913
DOI
出版状态已出版 - 10 2月 2016
活动IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Scottsdale, 美国
期限: 13 12月 201517 12月 2015

出版系列

姓名2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

会议

会议IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
国家/地区美国
Scottsdale
时期13/12/1517/12/15

指纹

探究 'Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features' 的科研主题。它们共同构成独一无二的指纹。

引用此