Real-time speech driven talking avatar

Bingfeng Li; Lei Xie; Xiangzeng Zhou; Zhonghua Fu; Yanning Zhang

Real-time speech driven talking avatar

Bingfeng Li, Lei Xie, Xiangzeng Zhou, Zhonghua Fu, Yanning Zhang

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

This paper presents a real-time speech driven talking avatar. Unlike most talking avatars in which the speech-synchronized facial animation is generated offline, this talking avatar is able to speak with live speech input. This life-like talking avatar has many potential applications in videophones, virtual conferences, audio/video chats and entertainment. Since phonemes are the smallest units of pronunciation, a real-time phoneme recognizer was built. The synchronization between the input live speech and the facial motion used a phoneme recognition and output algorithm. The coarticulation effects are included in a dynamic viseme generation algorithm to coordinate the facial animation parameters (FAPs) from the input phonemes. The MPEG-4 compliant avatar model is driven by the generated FAPs. Tests show that the avatar motion is synchronized and natural with MOS values of 3.42 and 3.5.

Original language	English
Pages (from-to)	1180-1186
Number of pages	7
Journal	Qinghua Daxue Xuebao/Journal of Tsinghua University
Volume	51
Issue number	9
State	Published - Sep 2011

Keywords

Facial animation
Talking avatar
Visual speech synthesis

Cite this

@article{babe072e97e2456d8e4668ca0062af42,

title = "Real-time speech driven talking avatar",

abstract = "This paper presents a real-time speech driven talking avatar. Unlike most talking avatars in which the speech-synchronized facial animation is generated offline, this talking avatar is able to speak with live speech input. This life-like talking avatar has many potential applications in videophones, virtual conferences, audio/video chats and entertainment. Since phonemes are the smallest units of pronunciation, a real-time phoneme recognizer was built. The synchronization between the input live speech and the facial motion used a phoneme recognition and output algorithm. The coarticulation effects are included in a dynamic viseme generation algorithm to coordinate the facial animation parameters (FAPs) from the input phonemes. The MPEG-4 compliant avatar model is driven by the generated FAPs. Tests show that the avatar motion is synchronized and natural with MOS values of 3.42 and 3.5.",

keywords = "Facial animation, Talking avatar, Visual speech synthesis",

author = "Bingfeng Li and Lei Xie and Xiangzeng Zhou and Zhonghua Fu and Yanning Zhang",

year = "2011",

month = sep,

language = "英语",

volume = "51",

pages = "1180--1186",

journal = "Qinghua Daxue Xuebao/Journal of Tsinghua University",

issn = "1000-0054",

publisher = "Tsinghua University Press",

number = "9",

}

TY - JOUR

T1 - Real-time speech driven talking avatar

AU - Li, Bingfeng

AU - Xie, Lei

AU - Zhou, Xiangzeng

AU - Fu, Zhonghua

AU - Zhang, Yanning

PY - 2011/9

Y1 - 2011/9

N2 - This paper presents a real-time speech driven talking avatar. Unlike most talking avatars in which the speech-synchronized facial animation is generated offline, this talking avatar is able to speak with live speech input. This life-like talking avatar has many potential applications in videophones, virtual conferences, audio/video chats and entertainment. Since phonemes are the smallest units of pronunciation, a real-time phoneme recognizer was built. The synchronization between the input live speech and the facial motion used a phoneme recognition and output algorithm. The coarticulation effects are included in a dynamic viseme generation algorithm to coordinate the facial animation parameters (FAPs) from the input phonemes. The MPEG-4 compliant avatar model is driven by the generated FAPs. Tests show that the avatar motion is synchronized and natural with MOS values of 3.42 and 3.5.

AB - This paper presents a real-time speech driven talking avatar. Unlike most talking avatars in which the speech-synchronized facial animation is generated offline, this talking avatar is able to speak with live speech input. This life-like talking avatar has many potential applications in videophones, virtual conferences, audio/video chats and entertainment. Since phonemes are the smallest units of pronunciation, a real-time phoneme recognizer was built. The synchronization between the input live speech and the facial motion used a phoneme recognition and output algorithm. The coarticulation effects are included in a dynamic viseme generation algorithm to coordinate the facial animation parameters (FAPs) from the input phonemes. The MPEG-4 compliant avatar model is driven by the generated FAPs. Tests show that the avatar motion is synchronized and natural with MOS values of 3.42 and 3.5.

KW - Facial animation

KW - Talking avatar

KW - Visual speech synthesis

UR - http://www.scopus.com/inward/record.url?scp=80355142093&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:80355142093

SN - 1000-0054

VL - 51

SP - 1180

EP - 1186

JO - Qinghua Daxue Xuebao/Journal of Tsinghua University

JF - Qinghua Daxue Xuebao/Journal of Tsinghua University

IS - 9

ER -

Real-time speech driven talking avatar

Abstract

Keywords

Other files and links

Fingerprint

Cite this