TY - GEN
T1 - Predicting articulatory movement from text using deep architecture with stacked bottleneck features
AU - Wei, Zhen
AU - Wu, Zhizheng
AU - Xie, Lei
N1 - Publisher Copyright:
© 2016 Asia Pacific Signal and Information Processing Association.
PY - 2017/1/17
Y1 - 2017/1/17
N2 - Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.
AB - Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.
KW - articulatory movement prediction
KW - deep neural network
KW - stacked bottleneck features
UR - http://www.scopus.com/inward/record.url?scp=85013812855&partnerID=8YFLogxK
U2 - 10.1109/APSIPA.2016.7820703
DO - 10.1109/APSIPA.2016.7820703
M3 - 会议稿件
AN - SCOPUS:85013812855
T3 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
BT - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
Y2 - 13 December 2016 through 16 December 2016
ER -