TY - JOUR
T1 - BLSTM neural networks for speech driven head motion synthesis
AU - Ding, Chuang
AU - Zhu, Pengcheng
AU - Xie, Lei
N1 - Publisher Copyright:
Copyright © 2015 ISCA.
PY - 2015
Y1 - 2015
N2 - Head motion naturally occurs in synchrony with speech and carries important intention, attitude and emotion factors. This paper aims to synthesize head motions from natural speech for talking avatar applications. Specifically, we study the feasibility of learning speech-to-head-motion regression models by two types of popular neural networks, i.e., feed-forward and bidirectional long short-term memory (BLSTM). We discover that the BLSTM networks apparently outperform the feedforward ones in this task because of their capacity of learning long-range speech dynamics. More interestingly, we observe that stacking different networks, i.e., inserting a feed-forward layer into two BLSTM layers, achieves the best performance. Subjective evaluation shows that this hybrid network can produce more plausible head motions from speech.
AB - Head motion naturally occurs in synchrony with speech and carries important intention, attitude and emotion factors. This paper aims to synthesize head motions from natural speech for talking avatar applications. Specifically, we study the feasibility of learning speech-to-head-motion regression models by two types of popular neural networks, i.e., feed-forward and bidirectional long short-term memory (BLSTM). We discover that the BLSTM networks apparently outperform the feedforward ones in this task because of their capacity of learning long-range speech dynamics. More interestingly, we observe that stacking different networks, i.e., inserting a feed-forward layer into two BLSTM layers, achieves the best performance. Subjective evaluation shows that this hybrid network can produce more plausible head motions from speech.
KW - BLSTM
KW - Head motion synthesis
KW - Neural networks
KW - Talking avatar
UR - http://www.scopus.com/inward/record.url?scp=84959161793&partnerID=8YFLogxK
U2 - 10.21437/interspeech.2015-137
DO - 10.21437/interspeech.2015-137
M3 - 会议文章
AN - SCOPUS:84959161793
SN - 2308-457X
VL - 2015-January
SP - 3345
EP - 3349
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
Y2 - 6 September 2015 through 10 September 2015
ER -