Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features

Chuang Ding; Lei Xie; Jie Yan; Weini Zhang; Yang Liu

doi:10.1109/ASRU.2015.7404780

Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features

Chuang Ding, Lei Xie, Jie Yan, Weini Zhang, Yang Liu

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

37 Scopus citations

Abstract

Prosody affects the naturalness and intelligibility of speech. However, automatic prosody prediction from text for Chinese speech synthesis is still a great challenge and the traditional conditional random fields (CRF) based method always heavily relies on feature engineering. In this paper, we propose to use neural networks to predict prosodic boundary labels directly from Chinese characters without any feature engineering. Experimental results show that stacking feed-forward and bidirectional long short-term memory (BLSTM) recurrent network layers achieves superior performance over the CRF-based method. The embedding features learned from raw text further enhance the performance.

Original language	English
Title of host publication	2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	98-102
Number of pages	5
ISBN (Electronic)	9781479972913
DOIs	https://doi.org/10.1109/ASRU.2015.7404780
State	Published - 10 Feb 2016
Event	IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Scottsdale, United States Duration: 13 Dec 2015 → 17 Dec 2015

Publication series

Name	2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

Conference

Conference	IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
Country/Territory	United States
City	Scottsdale
Period	13/12/15 → 17/12/15

Keywords

automatic prosody prediction
BLSTM
embedding features
neural network
speech synthesis

Access to Document

10.1109/ASRU.2015.7404780

Cite this

Ding, C., Xie, L., Yan, J., Zhang, W., & Liu, Y. (2016). Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings (pp. 98-102). Article 7404780 (2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2015.7404780

Ding, Chuang ; Xie, Lei ; Yan, Jie et al. / Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 98-102 (2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings).

@inproceedings{b5ccc77c13ee412fbf8246ba0dee8836,

title = "Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features",

abstract = "Prosody affects the naturalness and intelligibility of speech. However, automatic prosody prediction from text for Chinese speech synthesis is still a great challenge and the traditional conditional random fields (CRF) based method always heavily relies on feature engineering. In this paper, we propose to use neural networks to predict prosodic boundary labels directly from Chinese characters without any feature engineering. Experimental results show that stacking feed-forward and bidirectional long short-term memory (BLSTM) recurrent network layers achieves superior performance over the CRF-based method. The embedding features learned from raw text further enhance the performance.",

keywords = "automatic prosody prediction, BLSTM, embedding features, neural network, speech synthesis",

author = "Chuang Ding and Lei Xie and Jie Yan and Weini Zhang and Yang Liu",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 ; Conference date: 13-12-2015 Through 17-12-2015",

year = "2016",

month = feb,

day = "10",

doi = "10.1109/ASRU.2015.7404780",

language = "英语",

series = "2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "98--102",

booktitle = "2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings",

}

Ding, C, Xie, L , Yan, J, Zhang, W & Liu, Y 2016, Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features. in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings., 7404780, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 98-102, IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, United States, 13/12/15. https://doi.org/10.1109/ASRU.2015.7404780

Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features. / Ding, Chuang; Xie, Lei ; Yan, Jie et al.
2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. p. 98-102 7404780 (2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features

AU - Ding, Chuang

AU - Xie, Lei

AU - Yan, Jie

AU - Zhang, Weini

AU - Liu, Yang

PY - 2016/2/10

Y1 - 2016/2/10

N2 - Prosody affects the naturalness and intelligibility of speech. However, automatic prosody prediction from text for Chinese speech synthesis is still a great challenge and the traditional conditional random fields (CRF) based method always heavily relies on feature engineering. In this paper, we propose to use neural networks to predict prosodic boundary labels directly from Chinese characters without any feature engineering. Experimental results show that stacking feed-forward and bidirectional long short-term memory (BLSTM) recurrent network layers achieves superior performance over the CRF-based method. The embedding features learned from raw text further enhance the performance.

AB - Prosody affects the naturalness and intelligibility of speech. However, automatic prosody prediction from text for Chinese speech synthesis is still a great challenge and the traditional conditional random fields (CRF) based method always heavily relies on feature engineering. In this paper, we propose to use neural networks to predict prosodic boundary labels directly from Chinese characters without any feature engineering. Experimental results show that stacking feed-forward and bidirectional long short-term memory (BLSTM) recurrent network layers achieves superior performance over the CRF-based method. The embedding features learned from raw text further enhance the performance.

KW - automatic prosody prediction

KW - BLSTM

KW - embedding features

KW - neural network

KW - speech synthesis

UR - http://www.scopus.com/inward/record.url?scp=84964556012&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2015.7404780

DO - 10.1109/ASRU.2015.7404780

M3 - 会议稿件

AN - SCOPUS:84964556012

T3 - 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

SP - 98

EP - 102

BT - 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015

Y2 - 13 December 2015 through 17 December 2015

ER -

Ding C, Xie L , Yan J, Zhang W, Liu Y. Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2016. p. 98-102. 7404780. (2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings). doi: 10.1109/ASRU.2015.7404780

Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this