Head motion generation for speech-driven talking avatar

Bingfeng Li; Lei Xie; Pengcheng Zhu; Bo Fan

Head motion generation for speech-driven talking avatar

Bingfeng Li, Lei Xie, Pengcheng Zhu, Bo Fan

计算机学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

This study describes methods for predicting head motion from acoustic speech. Current hidden Markov model (HMM)-based methods rely on definitions of typical head motion patterns and accurate recognition of these patterns. This study investigates the head motion prediction performance of various pattern definition strategies. The HMM method is less effective because the association between speech and the head gestures is essentially a nondeterministic, many-to-many mapping so the head motion pattern recognition accuracy is quite low. Therefore, this study treats the speech-to-head-motion mapping task as a regression problem. A back-propagation (BP) neutral network is used to seek a direct, continuous mapping from the acoustic speech to the head motion. Tests show that this neutral network approach significantly improves the head motion prediction accuracy and the naturalness of head movement of a talking avatar.

源语言	英语
页（从-至）	898-902
页数	5
期刊	Qinghua Daxue Xuebao/Journal of Tsinghua University
卷	53
期	6
出版状态	已出版 - 2013

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d3206449aec947c6bcf12e41fa09a70d,

title = "Head motion generation for speech-driven talking avatar",

abstract = "This study describes methods for predicting head motion from acoustic speech. Current hidden Markov model (HMM)-based methods rely on definitions of typical head motion patterns and accurate recognition of these patterns. This study investigates the head motion prediction performance of various pattern definition strategies. The HMM method is less effective because the association between speech and the head gestures is essentially a nondeterministic, many-to-many mapping so the head motion pattern recognition accuracy is quite low. Therefore, this study treats the speech-to-head-motion mapping task as a regression problem. A back-propagation (BP) neutral network is used to seek a direct, continuous mapping from the acoustic speech to the head motion. Tests show that this neutral network approach significantly improves the head motion prediction accuracy and the naturalness of head movement of a talking avatar.",

keywords = "Head motion generation, Hidden Markov model, Neural network, Talking avatar, Talking head",

author = "Bingfeng Li and Lei Xie and Pengcheng Zhu and Bo Fan",

year = "2013",

language = "英语",

volume = "53",

pages = "898--902",

journal = "Qinghua Daxue Xuebao/Journal of Tsinghua University",

issn = "1000-0054",

publisher = "Tsinghua University Press",

number = "6",

}

TY - JOUR

T1 - Head motion generation for speech-driven talking avatar

AU - Li, Bingfeng

AU - Xie, Lei

AU - Zhu, Pengcheng

AU - Fan, Bo

PY - 2013

Y1 - 2013

N2 - This study describes methods for predicting head motion from acoustic speech. Current hidden Markov model (HMM)-based methods rely on definitions of typical head motion patterns and accurate recognition of these patterns. This study investigates the head motion prediction performance of various pattern definition strategies. The HMM method is less effective because the association between speech and the head gestures is essentially a nondeterministic, many-to-many mapping so the head motion pattern recognition accuracy is quite low. Therefore, this study treats the speech-to-head-motion mapping task as a regression problem. A back-propagation (BP) neutral network is used to seek a direct, continuous mapping from the acoustic speech to the head motion. Tests show that this neutral network approach significantly improves the head motion prediction accuracy and the naturalness of head movement of a talking avatar.

AB - This study describes methods for predicting head motion from acoustic speech. Current hidden Markov model (HMM)-based methods rely on definitions of typical head motion patterns and accurate recognition of these patterns. This study investigates the head motion prediction performance of various pattern definition strategies. The HMM method is less effective because the association between speech and the head gestures is essentially a nondeterministic, many-to-many mapping so the head motion pattern recognition accuracy is quite low. Therefore, this study treats the speech-to-head-motion mapping task as a regression problem. A back-propagation (BP) neutral network is used to seek a direct, continuous mapping from the acoustic speech to the head motion. Tests show that this neutral network approach significantly improves the head motion prediction accuracy and the naturalness of head movement of a talking avatar.

KW - Head motion generation

KW - Hidden Markov model

KW - Neural network

KW - Talking avatar

KW - Talking head

UR - http://www.scopus.com/inward/record.url?scp=84886310910&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:84886310910

SN - 1000-0054

VL - 53

SP - 898

EP - 902

JO - Qinghua Daxue Xuebao/Journal of Tsinghua University

JF - Qinghua Daxue Xuebao/Journal of Tsinghua University

IS - 6

ER -

Head motion generation for speech-driven talking avatar

摘要

其它文件与链接

指纹

引用此