Visual speech animation

Lei Xie; Lijuan Wang; Shan Yang

doi:10.1007/978-3-319-14418-4_1.

Visual speech animation

Lei Xie, Lijuan Wang, Shan Yang

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review

Abstract

Visual speech animation (VSA) has many potential applications in humancomputer interaction, assisted language learning, entertainments, and other areas. But it is one of the most challenging tasks in human motion animation because of the complex mechanisms of speech production and facial motion. This chapter surveys the basic principles, state-of-the-art technologies, and featured applications in this area. Specifically, after introducing the basic concepts and the building blocks of a typical VSA system, we showcase a state-of-the-art approach based on the deep bidirectional long short-term memory (DBLSM) recurrent neural networks (RNN) for audio-to-visual mapping, which aims to create a video-realistic talking head. Finally, the Engkoo project from Microsoft is highlighted as a practical application of visual speech animation in language learning.

Original language	English
Title of host publication	Handbook of Human Motion
Publisher	Springer International Publishing
Pages	2115-2144
Number of pages	30
Volume	3-3
ISBN (Electronic)	9783319144184
ISBN (Print)	9783319144177
DOIs	https://doi.org/10.1007/978-3-319-14418-4_1.
State	Published - 4 Apr 2018

Keywords

Audio visual speech
Audio-to-visual mapping
Deep learning
Deep neural network
Facial animation
Talking avatar
Talking face
Talking head
Visual speech animation
Visual speech synthesis

Access to Document

10.1007/978-3-319-14418-4_1.

Cite this

@inbook{23575b5ce8274abc82e289c709dc90d3,

title = "Visual speech animation",

abstract = "Visual speech animation (VSA) has many potential applications in humancomputer interaction, assisted language learning, entertainments, and other areas. But it is one of the most challenging tasks in human motion animation because of the complex mechanisms of speech production and facial motion. This chapter surveys the basic principles, state-of-the-art technologies, and featured applications in this area. Specifically, after introducing the basic concepts and the building blocks of a typical VSA system, we showcase a state-of-the-art approach based on the deep bidirectional long short-term memory (DBLSM) recurrent neural networks (RNN) for audio-to-visual mapping, which aims to create a video-realistic talking head. Finally, the Engkoo project from Microsoft is highlighted as a practical application of visual speech animation in language learning.",

keywords = "Audio visual speech, Audio-to-visual mapping, Deep learning, Deep neural network, Facial animation, Talking avatar, Talking face, Talking head, Visual speech animation, Visual speech synthesis",

author = "Lei Xie and Lijuan Wang and Shan Yang",

note = "Publisher Copyright: {\textcopyright} Springer International Publishing AG, part of Springer Nature 2018.",

year = "2018",

month = apr,

day = "4",

doi = "10.1007/978-3-319-14418-4_1.",

language = "英语",

isbn = "9783319144177",

volume = "3-3",

pages = "2115--2144",

booktitle = "Handbook of Human Motion",

publisher = "Springer International Publishing",

}

TY - CHAP

T1 - Visual speech animation

AU - Xie, Lei

AU - Wang, Lijuan

AU - Yang, Shan

N1 - Publisher Copyright: © Springer International Publishing AG, part of Springer Nature 2018.

PY - 2018/4/4

Y1 - 2018/4/4

N2 - Visual speech animation (VSA) has many potential applications in humancomputer interaction, assisted language learning, entertainments, and other areas. But it is one of the most challenging tasks in human motion animation because of the complex mechanisms of speech production and facial motion. This chapter surveys the basic principles, state-of-the-art technologies, and featured applications in this area. Specifically, after introducing the basic concepts and the building blocks of a typical VSA system, we showcase a state-of-the-art approach based on the deep bidirectional long short-term memory (DBLSM) recurrent neural networks (RNN) for audio-to-visual mapping, which aims to create a video-realistic talking head. Finally, the Engkoo project from Microsoft is highlighted as a practical application of visual speech animation in language learning.

AB - Visual speech animation (VSA) has many potential applications in humancomputer interaction, assisted language learning, entertainments, and other areas. But it is one of the most challenging tasks in human motion animation because of the complex mechanisms of speech production and facial motion. This chapter surveys the basic principles, state-of-the-art technologies, and featured applications in this area. Specifically, after introducing the basic concepts and the building blocks of a typical VSA system, we showcase a state-of-the-art approach based on the deep bidirectional long short-term memory (DBLSM) recurrent neural networks (RNN) for audio-to-visual mapping, which aims to create a video-realistic talking head. Finally, the Engkoo project from Microsoft is highlighted as a practical application of visual speech animation in language learning.

KW - Audio visual speech

KW - Audio-to-visual mapping

KW - Deep learning

KW - Deep neural network

KW - Facial animation

KW - Talking avatar

KW - Talking face

KW - Talking head

KW - Visual speech animation

KW - Visual speech synthesis

UR - http://www.scopus.com/inward/record.url?scp=85078712421&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-14418-4_1.

DO - 10.1007/978-3-319-14418-4_1.

M3 - 章节

AN - SCOPUS:85078712421

SN - 9783319144177

VL - 3-3

SP - 2115

EP - 2144

BT - Handbook of Human Motion

PB - Springer International Publishing

ER -

Visual speech animation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this