Context dependent viseme models for voice driven animation

Xie Lei, Jiang Dongmei, I. Ravyse, W. Verhelst, H. Sahli, V. Slavova, Z. Rongchun

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

This paper addresses the problem of animating a talking figure, such as an avatar, using speech input only. The system that was developed is based on hidden Markov models for the acoustic observation vectors of the speech sounds that correspond to each of 16 visually distinct mouth shapes (visemes). The acoustic variability with context was taken into account by building acoustic viseme models that are dependent on the left and right viseme contexts. Our experimental results show that it is indeed possible to obtain visually relevant speech segmentation data directly from the purely acoustic speech signal.

Original languageEnglish
Title of host publicationProceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications
EditorsSonja Grgic, Mislav Grgic
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages649-654
Number of pages6
ISBN (Electronic)9531840547, 9789531840545
DOIs
StatePublished - 2003
Event4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003 - Zagreb, Croatia
Duration: 2 Jul 20035 Jul 2003

Publication series

NameProceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications
Volume2

Conference

Conference4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003
Country/TerritoryCroatia
CityZagreb
Period2/07/035/07/03

Keywords

  • Animation
  • Automatic speech recognition
  • Avatars
  • Context modeling
  • Hidden Markov models
  • Mouth
  • Robustness
  • Shape
  • Speech processing
  • Speech recognition

Fingerprint

Dive into the research topics of 'Context dependent viseme models for voice driven animation'. Together they form a unique fingerprint.

Cite this