Predicting articulatory movement from text using deep architecture with stacked bottleneck features

Zhen Wei; Zhizheng Wu; Lei Xie

doi:10.1109/APSIPA.2016.7820703

Predicting articulatory movement from text using deep architecture with stacked bottleneck features

Zhen Wei, Zhizheng Wu, Lei Xie

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Scopus citations

Abstract

Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.

Original language	English
Title of host publication	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9789881476821
DOIs	https://doi.org/10.1109/APSIPA.2016.7820703
State	Published - 17 Jan 2017
Externally published	Yes
Event	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 - Jeju, Korea, Republic of Duration: 13 Dec 2016 → 16 Dec 2016

Publication series

Name	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

Conference

Conference	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
Country/Territory	Korea, Republic of
City	Jeju
Period	13/12/16 → 16/12/16

Keywords

articulatory movement prediction
deep neural network
stacked bottleneck features

Access to Document

10.1109/APSIPA.2016.7820703

Cite this

Wei, Z., Wu, Z., & Xie, L. (2017). Predicting articulatory movement from text using deep architecture with stacked bottleneck features. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 Article 7820703 (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2016.7820703

Wei, Zhen ; Wu, Zhizheng ; Xie, Lei. / Predicting articulatory movement from text using deep architecture with stacked bottleneck features. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc., 2017. (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016).

@inproceedings{e9cde9715f5a41f18cc78d778ba3e4f2,

title = "Predicting articulatory movement from text using deep architecture with stacked bottleneck features",

abstract = "Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.",

keywords = "articulatory movement prediction, deep neural network, stacked bottleneck features",

author = "Zhen Wei and Zhizheng Wu and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2016 Asia Pacific Signal and Information Processing Association.; 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 ; Conference date: 13-12-2016 Through 16-12-2016",

year = "2017",

month = jan,

day = "17",

doi = "10.1109/APSIPA.2016.7820703",

language = "英语",

series = "2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016",

}

Wei, Z, Wu, Z & Xie, L 2017, Predicting articulatory movement from text using deep architecture with stacked bottleneck features. in 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016., 7820703, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016, Institute of Electrical and Electronics Engineers Inc., 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016, Jeju, Korea, Republic of, 13/12/16. https://doi.org/10.1109/APSIPA.2016.7820703

Predicting articulatory movement from text using deep architecture with stacked bottleneck features. / Wei, Zhen; Wu, Zhizheng; Xie, Lei.
2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc., 2017. 7820703 (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Predicting articulatory movement from text using deep architecture with stacked bottleneck features

AU - Wei, Zhen

AU - Wu, Zhizheng

AU - Xie, Lei

PY - 2017/1/17

Y1 - 2017/1/17

N2 - Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.

AB - Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.

KW - articulatory movement prediction

KW - deep neural network

KW - stacked bottleneck features

UR - http://www.scopus.com/inward/record.url?scp=85013812855&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2016.7820703

DO - 10.1109/APSIPA.2016.7820703

M3 - 会议稿件

AN - SCOPUS:85013812855

T3 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

BT - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

Y2 - 13 December 2016 through 16 December 2016

ER -

Wei Z, Wu Z, Xie L. Predicting articulatory movement from text using deep architecture with stacked bottleneck features. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc. 2017. 7820703. (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016). doi: 10.1109/APSIPA.2016.7820703

Predicting articulatory movement from text using deep architecture with stacked bottleneck features

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this