TY - JOUR
T1 - Speech-driven head motion synthesis using neural networks
AU - Ding, Chuang
AU - Zhu, Pengcheng
AU - Xie, Lei
AU - Jiang, Dongmei
AU - Fu, Zhonghua
N1 - Publisher Copyright:
Copyright © 2014 ISCA.
PY - 2014
Y1 - 2014
N2 - This paper presents a neural network approach for speech-driven head motion synthesis, which can automatically predict a speaker's head movement from his/her speech. Specifically, we realize speech-to-head-motion mapping by learning a multi-layer perceptron from audio-visual broadcast news data. First, we show that a generatively pre-trained neural network significantly outperforms a randomly initialized network and the hidden Markov model (HMM) approach. Second, we demonstrate that the feature combination of log Mel-scale filter-bank (FBank), energy and fundamental frequency (F0) performs best in head motion prediction. Third, we discover that using long context acoustic information can further improve the performance. Finally, extra unlabeled training data used in the pre-training stage can achieve more performance gain. The proposed speech-driven head motion synthesis approach increases the CCA from 0.299 (the HMM approach) to 0.565 and it can be effectively used in expressive talking avatar animation.
AB - This paper presents a neural network approach for speech-driven head motion synthesis, which can automatically predict a speaker's head movement from his/her speech. Specifically, we realize speech-to-head-motion mapping by learning a multi-layer perceptron from audio-visual broadcast news data. First, we show that a generatively pre-trained neural network significantly outperforms a randomly initialized network and the hidden Markov model (HMM) approach. Second, we demonstrate that the feature combination of log Mel-scale filter-bank (FBank), energy and fundamental frequency (F0) performs best in head motion prediction. Third, we discover that using long context acoustic information can further improve the performance. Finally, extra unlabeled training data used in the pre-training stage can achieve more performance gain. The proposed speech-driven head motion synthesis approach increases the CCA from 0.299 (the HMM approach) to 0.565 and it can be effectively used in expressive talking avatar animation.
KW - Deep neural network
KW - Head motion synthesis
KW - Neural network
KW - Talking avatar
UR - http://www.scopus.com/inward/record.url?scp=84910030988&partnerID=8YFLogxK
M3 - 会议文章
AN - SCOPUS:84910030988
SN - 2308-457X
SP - 2303
EP - 2307
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014
Y2 - 14 September 2014 through 18 September 2014
ER -