Predicting articulatory movement from text using deep architecture with stacked bottleneck features

Zhen Wei; Zhizheng Wu; Lei Xie

doi:10.1109/APSIPA.2016.7820703

Predicting articulatory movement from text using deep architecture with stacked bottleneck features

Zhen Wei, Zhizheng Wu, Lei Xie

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

3 引用（Scopus）

摘要

Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.

源语言	英语
主期刊名	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
出版商	Institute of Electrical and Electronics Engineers Inc.
ISBN（电子版）	9789881476821
DOI	https://doi.org/10.1109/APSIPA.2016.7820703
出版状态	已出版 - 17 1月 2017
已对外发布	是
活动	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 - Jeju, 韩国期限: 13 12月 2016 → 16 12月 2016

出版系列

姓名	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

会议

会议	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
国家/地区	韩国
市	Jeju
时期	13/12/16 → 16/12/16

访问文件

10.1109/APSIPA.2016.7820703

其它文件与链接

链接到 Scopus 的出版物

引用此

Wei, Z., Wu, Z., & Xie, L. (2017). Predicting articulatory movement from text using deep architecture with stacked bottleneck features. 在 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 文章 7820703 (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2016.7820703

Wei, Zhen ; Wu, Zhizheng ; Xie, Lei. / Predicting articulatory movement from text using deep architecture with stacked bottleneck features. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc., 2017. (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016).

@inproceedings{e9cde9715f5a41f18cc78d778ba3e4f2,

title = "Predicting articulatory movement from text using deep architecture with stacked bottleneck features",

abstract = "Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.",

keywords = "articulatory movement prediction, deep neural network, stacked bottleneck features",

author = "Zhen Wei and Zhizheng Wu and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2016 Asia Pacific Signal and Information Processing Association.; 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 ; Conference date: 13-12-2016 Through 16-12-2016",

year = "2017",

month = jan,

day = "17",

doi = "10.1109/APSIPA.2016.7820703",

language = "英语",

series = "2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016",

}

Wei, Z, Wu, Z & Xie, L 2017, Predicting articulatory movement from text using deep architecture with stacked bottleneck features. 在 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016., 7820703, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016, Institute of Electrical and Electronics Engineers Inc., 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016, Jeju, 韩国, 13/12/16. https://doi.org/10.1109/APSIPA.2016.7820703

Predicting articulatory movement from text using deep architecture with stacked bottleneck features. / Wei, Zhen; Wu, Zhizheng; Xie, Lei.
2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc., 2017. 7820703 (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Predicting articulatory movement from text using deep architecture with stacked bottleneck features

AU - Wei, Zhen

AU - Wu, Zhizheng

AU - Xie, Lei

PY - 2017/1/17

Y1 - 2017/1/17

N2 - Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.

AB - Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.

KW - articulatory movement prediction

KW - deep neural network

KW - stacked bottleneck features

UR - http://www.scopus.com/inward/record.url?scp=85013812855&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2016.7820703

DO - 10.1109/APSIPA.2016.7820703

M3 - 会议稿件

AN - SCOPUS:85013812855

T3 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

BT - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

Y2 - 13 December 2016 through 16 December 2016

ER -

Wei Z, Wu Z, Xie L. Predicting articulatory movement from text using deep architecture with stacked bottleneck features. 在 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc. 2017. 7820703. (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016). doi: 10.1109/APSIPA.2016.7820703

Predicting articulatory movement from text using deep architecture with stacked bottleneck features

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此