TY - JOUR
T1 - A statistical parametric approach to video-realistic text-driven talking avatar
AU - Xie, Lei
AU - Sun, Naicai
AU - Fan, Bo
N1 - Publisher Copyright:
© 2013, Springer Science+Business Media New York.
PY - 2014/9/17
Y1 - 2014/9/17
N2 - This paper proposes a statistical parametric approach to video-realistic text-driven talking avatar. We follow the trajectory HMM approach where audio and visual speech are jointly modeled by HMMs and continuous audiovisual speech parameter trajectories are synthesized based on the maximum likelihood criterion. Previous trajectory HMM approaches only focus on mouth animation, which synthesizes simple geometric mouth shapes or video-realistic effects of the lip motion. Our approach uses trajectory HMM to generate visual parameters of the lower face and it realizes video-realistic animation of the whole face. Specifically, we use active appearance model (AAM) to model the visual speech, which offers a convenient and compact statistical model of both the shape and the appearance variations of the face. To realize video-realistic effects with high fidelity, we use Poisson image editing technique to stitch the synthesized lower-face image to a whole face image seamlessly. Objective and subjective experiments show that the proposed approach can produce natural facial animation.
AB - This paper proposes a statistical parametric approach to video-realistic text-driven talking avatar. We follow the trajectory HMM approach where audio and visual speech are jointly modeled by HMMs and continuous audiovisual speech parameter trajectories are synthesized based on the maximum likelihood criterion. Previous trajectory HMM approaches only focus on mouth animation, which synthesizes simple geometric mouth shapes or video-realistic effects of the lip motion. Our approach uses trajectory HMM to generate visual parameters of the lower face and it realizes video-realistic animation of the whole face. Specifically, we use active appearance model (AAM) to model the visual speech, which offers a convenient and compact statistical model of both the shape and the appearance variations of the face. To realize video-realistic effects with high fidelity, we use Poisson image editing technique to stitch the synthesized lower-face image to a whole face image seamlessly. Objective and subjective experiments show that the proposed approach can produce natural facial animation.
KW - Active appearance model
KW - Facial animation
KW - Hidden Markov model
KW - Taking avatar
KW - Visual speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=84919338929&partnerID=8YFLogxK
U2 - 10.1007/s11042-013-1633-3
DO - 10.1007/s11042-013-1633-3
M3 - 文章
AN - SCOPUS:84919338929
SN - 1380-7501
VL - 73
SP - 377
EP - 396
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 1
ER -