跳到主要导航 跳到搜索 跳到主要内容

Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks andword/phone embeddings

  • Pengcheng Zhu
  • , Lei Xie
  • , Yunlin Chen

科研成果: 期刊稿件会议文章同行评审

34 引用 (Scopus)

摘要

Automatic prediction of articulatory movements from speech or text can be beneficial for many applications such as speech recognition and synthesis. A recent approach has reported stateof- the-art performance in speech-to-articulatory prediction using feed forward neural networks. In this paper, we investigate the feasibility of using bidirectional long short-term memory based recurrent neural networks (BLSTM-RNNs) in articulatory movement prediction because they have long-context trajectory modeling ability. We show on the MNGU0 dataset that BLSTM-RNN apparently outperforms feed forward networks and pushes the state-of-the-art RMSE from 0.885 mm to 0.565 mm. On the other hand, predicting articulatory information from text heavily relies on handcrafted linguistic and prosodic features, e.g., POS and TOBI labels. In this paper, we propose to use word and phone embeddings to substitute these manual features. Word/phone embedding features are automatically learned from unlabeled text data by a neural network language model. We show that word and phone embeddings can achieve comparable performance without using POS and TOBI features. More promisingly, combining the conventional full feature set with phone embedding, the lowest RMSE is achieved.

源语言英语
页(从-至)2192-2196
页数5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2015-January
出版状态已出版 - 2015
活动16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, 德国
期限: 6 9月 201510 9月 2015

指纹

探究 'Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks andword/phone embeddings' 的科研主题。它们共同构成独一无二的指纹。

引用此