Abstract
This study describes methods for predicting head motion from acoustic speech. Current hidden Markov model (HMM)-based methods rely on definitions of typical head motion patterns and accurate recognition of these patterns. This study investigates the head motion prediction performance of various pattern definition strategies. The HMM method is less effective because the association between speech and the head gestures is essentially a nondeterministic, many-to-many mapping so the head motion pattern recognition accuracy is quite low. Therefore, this study treats the speech-to-head-motion mapping task as a regression problem. A back-propagation (BP) neutral network is used to seek a direct, continuous mapping from the acoustic speech to the head motion. Tests show that this neutral network approach significantly improves the head motion prediction accuracy and the naturalness of head movement of a talking avatar.
Original language | English |
---|---|
Pages (from-to) | 898-902 |
Number of pages | 5 |
Journal | Qinghua Daxue Xuebao/Journal of Tsinghua University |
Volume | 53 |
Issue number | 6 |
State | Published - 2013 |
Keywords
- Head motion generation
- Hidden Markov model
- Neural network
- Talking avatar
- Talking head