Head motion generation for speech-driven talking avatar

Bingfeng Li; Lei Xie; Pengcheng Zhu; Bo Fan

Head motion generation for speech-driven talking avatar

Bingfeng Li, Lei Xie, Pengcheng Zhu, Bo Fan

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

This study describes methods for predicting head motion from acoustic speech. Current hidden Markov model (HMM)-based methods rely on definitions of typical head motion patterns and accurate recognition of these patterns. This study investigates the head motion prediction performance of various pattern definition strategies. The HMM method is less effective because the association between speech and the head gestures is essentially a nondeterministic, many-to-many mapping so the head motion pattern recognition accuracy is quite low. Therefore, this study treats the speech-to-head-motion mapping task as a regression problem. A back-propagation (BP) neutral network is used to seek a direct, continuous mapping from the acoustic speech to the head motion. Tests show that this neutral network approach significantly improves the head motion prediction accuracy and the naturalness of head movement of a talking avatar.

Original language	English
Pages (from-to)	898-902
Number of pages	5
Journal	Qinghua Daxue Xuebao/Journal of Tsinghua University
Volume	53
Issue number	6
State	Published - 2013

Keywords

Head motion generation
Hidden Markov model
Neural network
Talking avatar
Talking head

Cite this

@article{d3206449aec947c6bcf12e41fa09a70d,

title = "Head motion generation for speech-driven talking avatar",

abstract = "This study describes methods for predicting head motion from acoustic speech. Current hidden Markov model (HMM)-based methods rely on definitions of typical head motion patterns and accurate recognition of these patterns. This study investigates the head motion prediction performance of various pattern definition strategies. The HMM method is less effective because the association between speech and the head gestures is essentially a nondeterministic, many-to-many mapping so the head motion pattern recognition accuracy is quite low. Therefore, this study treats the speech-to-head-motion mapping task as a regression problem. A back-propagation (BP) neutral network is used to seek a direct, continuous mapping from the acoustic speech to the head motion. Tests show that this neutral network approach significantly improves the head motion prediction accuracy and the naturalness of head movement of a talking avatar.",

keywords = "Head motion generation, Hidden Markov model, Neural network, Talking avatar, Talking head",

author = "Bingfeng Li and Lei Xie and Pengcheng Zhu and Bo Fan",

year = "2013",

language = "英语",

volume = "53",

pages = "898--902",

journal = "Qinghua Daxue Xuebao/Journal of Tsinghua University",

issn = "1000-0054",

publisher = "Tsinghua University Press",

number = "6",

}

TY - JOUR

T1 - Head motion generation for speech-driven talking avatar

AU - Li, Bingfeng

AU - Xie, Lei

AU - Zhu, Pengcheng

AU - Fan, Bo

PY - 2013

Y1 - 2013

N2 - This study describes methods for predicting head motion from acoustic speech. Current hidden Markov model (HMM)-based methods rely on definitions of typical head motion patterns and accurate recognition of these patterns. This study investigates the head motion prediction performance of various pattern definition strategies. The HMM method is less effective because the association between speech and the head gestures is essentially a nondeterministic, many-to-many mapping so the head motion pattern recognition accuracy is quite low. Therefore, this study treats the speech-to-head-motion mapping task as a regression problem. A back-propagation (BP) neutral network is used to seek a direct, continuous mapping from the acoustic speech to the head motion. Tests show that this neutral network approach significantly improves the head motion prediction accuracy and the naturalness of head movement of a talking avatar.

AB - This study describes methods for predicting head motion from acoustic speech. Current hidden Markov model (HMM)-based methods rely on definitions of typical head motion patterns and accurate recognition of these patterns. This study investigates the head motion prediction performance of various pattern definition strategies. The HMM method is less effective because the association between speech and the head gestures is essentially a nondeterministic, many-to-many mapping so the head motion pattern recognition accuracy is quite low. Therefore, this study treats the speech-to-head-motion mapping task as a regression problem. A back-propagation (BP) neutral network is used to seek a direct, continuous mapping from the acoustic speech to the head motion. Tests show that this neutral network approach significantly improves the head motion prediction accuracy and the naturalness of head movement of a talking avatar.

KW - Head motion generation

KW - Hidden Markov model

KW - Neural network

KW - Talking avatar

KW - Talking head

UR - http://www.scopus.com/inward/record.url?scp=84886310910&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:84886310910

SN - 1000-0054

VL - 53

SP - 898

EP - 902

JO - Qinghua Daxue Xuebao/Journal of Tsinghua University

JF - Qinghua Daxue Xuebao/Journal of Tsinghua University

IS - 6

ER -

Head motion generation for speech-driven talking avatar

Abstract

Keywords

Other files and links

Fingerprint

Cite this