Visual speech animation

Lei Xie, Lijuan Wang, Shan Yang

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Visual speech animation (VSA) has many potential applications in humancomputer interaction, assisted language learning, entertainments, and other areas. But it is one of the most challenging tasks in human motion animation because of the complex mechanisms of speech production and facial motion. This chapter surveys the basic principles, state-of-the-art technologies, and featured applications in this area. Specifically, after introducing the basic concepts and the building blocks of a typical VSA system, we showcase a state-of-the-art approach based on the deep bidirectional long short-term memory (DBLSM) recurrent neural networks (RNN) for audio-to-visual mapping, which aims to create a video-realistic talking head. Finally, the Engkoo project from Microsoft is highlighted as a practical application of visual speech animation in language learning.

Original languageEnglish
Title of host publicationHandbook of Human Motion
PublisherSpringer International Publishing
Pages2115-2144
Number of pages30
Volume3-3
ISBN (Electronic)9783319144184
ISBN (Print)9783319144177
DOIs
StatePublished - 4 Apr 2018

Keywords

  • Audio visual speech
  • Audio-to-visual mapping
  • Deep learning
  • Deep neural network
  • Facial animation
  • Talking avatar
  • Talking face
  • Talking head
  • Visual speech animation
  • Visual speech synthesis

Fingerprint

Dive into the research topics of 'Visual speech animation'. Together they form a unique fingerprint.

Cite this