Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features

Chuang Ding, Lei Xie, Jie Yan, Weini Zhang, Yang Liu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

37 Scopus citations

Abstract

Prosody affects the naturalness and intelligibility of speech. However, automatic prosody prediction from text for Chinese speech synthesis is still a great challenge and the traditional conditional random fields (CRF) based method always heavily relies on feature engineering. In this paper, we propose to use neural networks to predict prosodic boundary labels directly from Chinese characters without any feature engineering. Experimental results show that stacking feed-forward and bidirectional long short-term memory (BLSTM) recurrent network layers achieves superior performance over the CRF-based method. The embedding features learned from raw text further enhance the performance.

Original languageEnglish
Title of host publication2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages98-102
Number of pages5
ISBN (Electronic)9781479972913
DOIs
StatePublished - 10 Feb 2016
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Scottsdale, United States
Duration: 13 Dec 201517 Dec 2015

Publication series

Name2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

Conference

ConferenceIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
Country/TerritoryUnited States
CityScottsdale
Period13/12/1517/12/15

Keywords

  • automatic prosody prediction
  • BLSTM
  • embedding features
  • neural network
  • speech synthesis

Fingerprint

Dive into the research topics of 'Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features'. Together they form a unique fingerprint.

Cite this