A statistical parametric approach to video-realistic text-driven talking avatar

Lei Xie; Naicai Sun; Bo Fan

doi:10.1007/s11042-013-1633-3

A statistical parametric approach to video-realistic text-driven talking avatar

Lei Xie, Naicai Sun, Bo Fan

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

16 Scopus citations

Abstract

This paper proposes a statistical parametric approach to video-realistic text-driven talking avatar. We follow the trajectory HMM approach where audio and visual speech are jointly modeled by HMMs and continuous audiovisual speech parameter trajectories are synthesized based on the maximum likelihood criterion. Previous trajectory HMM approaches only focus on mouth animation, which synthesizes simple geometric mouth shapes or video-realistic effects of the lip motion. Our approach uses trajectory HMM to generate visual parameters of the lower face and it realizes video-realistic animation of the whole face. Specifically, we use active appearance model (AAM) to model the visual speech, which offers a convenient and compact statistical model of both the shape and the appearance variations of the face. To realize video-realistic effects with high fidelity, we use Poisson image editing technique to stitch the synthesized lower-face image to a whole face image seamlessly. Objective and subjective experiments show that the proposed approach can produce natural facial animation.

Original language	English
Pages (from-to)	377-396
Number of pages	20
Journal	Multimedia Tools and Applications
Volume	73
Issue number	1
DOIs	https://doi.org/10.1007/s11042-013-1633-3
State	Published - 17 Sep 2014

Keywords

Active appearance model
Facial animation
Hidden Markov model
Taking avatar
Visual speech synthesis

Access to Document

10.1007/s11042-013-1633-3

Cite this

@article{1a577e8ca59b4d2d8f65502d2cdee2d9,

title = "A statistical parametric approach to video-realistic text-driven talking avatar",

abstract = "This paper proposes a statistical parametric approach to video-realistic text-driven talking avatar. We follow the trajectory HMM approach where audio and visual speech are jointly modeled by HMMs and continuous audiovisual speech parameter trajectories are synthesized based on the maximum likelihood criterion. Previous trajectory HMM approaches only focus on mouth animation, which synthesizes simple geometric mouth shapes or video-realistic effects of the lip motion. Our approach uses trajectory HMM to generate visual parameters of the lower face and it realizes video-realistic animation of the whole face. Specifically, we use active appearance model (AAM) to model the visual speech, which offers a convenient and compact statistical model of both the shape and the appearance variations of the face. To realize video-realistic effects with high fidelity, we use Poisson image editing technique to stitch the synthesized lower-face image to a whole face image seamlessly. Objective and subjective experiments show that the proposed approach can produce natural facial animation.",

keywords = "Active appearance model, Facial animation, Hidden Markov model, Taking avatar, Visual speech synthesis",

author = "Lei Xie and Naicai Sun and Bo Fan",

note = "Publisher Copyright: {\textcopyright} 2013, Springer Science+Business Media New York.",

year = "2014",

month = sep,

day = "17",

doi = "10.1007/s11042-013-1633-3",

language = "英语",

volume = "73",

pages = "377--396",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer",

number = "1",

}

TY - JOUR

T1 - A statistical parametric approach to video-realistic text-driven talking avatar

AU - Xie, Lei

AU - Sun, Naicai

AU - Fan, Bo

PY - 2014/9/17

Y1 - 2014/9/17

N2 - This paper proposes a statistical parametric approach to video-realistic text-driven talking avatar. We follow the trajectory HMM approach where audio and visual speech are jointly modeled by HMMs and continuous audiovisual speech parameter trajectories are synthesized based on the maximum likelihood criterion. Previous trajectory HMM approaches only focus on mouth animation, which synthesizes simple geometric mouth shapes or video-realistic effects of the lip motion. Our approach uses trajectory HMM to generate visual parameters of the lower face and it realizes video-realistic animation of the whole face. Specifically, we use active appearance model (AAM) to model the visual speech, which offers a convenient and compact statistical model of both the shape and the appearance variations of the face. To realize video-realistic effects with high fidelity, we use Poisson image editing technique to stitch the synthesized lower-face image to a whole face image seamlessly. Objective and subjective experiments show that the proposed approach can produce natural facial animation.

AB - This paper proposes a statistical parametric approach to video-realistic text-driven talking avatar. We follow the trajectory HMM approach where audio and visual speech are jointly modeled by HMMs and continuous audiovisual speech parameter trajectories are synthesized based on the maximum likelihood criterion. Previous trajectory HMM approaches only focus on mouth animation, which synthesizes simple geometric mouth shapes or video-realistic effects of the lip motion. Our approach uses trajectory HMM to generate visual parameters of the lower face and it realizes video-realistic animation of the whole face. Specifically, we use active appearance model (AAM) to model the visual speech, which offers a convenient and compact statistical model of both the shape and the appearance variations of the face. To realize video-realistic effects with high fidelity, we use Poisson image editing technique to stitch the synthesized lower-face image to a whole face image seamlessly. Objective and subjective experiments show that the proposed approach can produce natural facial animation.

KW - Active appearance model

KW - Facial animation

KW - Hidden Markov model

KW - Taking avatar

KW - Visual speech synthesis

UR - http://www.scopus.com/inward/record.url?scp=84919338929&partnerID=8YFLogxK

U2 - 10.1007/s11042-013-1633-3

DO - 10.1007/s11042-013-1633-3

M3 - 文章

AN - SCOPUS:84919338929

SN - 1380-7501

VL - 73

SP - 377

EP - 396

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 1

ER -

A statistical parametric approach to video-realistic text-driven talking avatar

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this