Accurate visual speech synthesis based on diviseme unit selection and concatenation

Dongmei Jiang, Ilse Ravyse, Hichem Sahli, Yanning Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

This paper presents a novel speech driven accurate realistic visual speech synthesis approach. Firstly, an audio visual instance database is built for different viseme context combinations, i.e. diviseme units, using 100 audio visual speech sentences of a female speaker. Then a diviseme instance selection algorithm is introduced to choose the optimal diviseme instances for the viseme contexts in the input speech, considering both the concatenation smoothness of the image sequences, and matching of the mouth movements to the acoustic pronunciation process, as well the intensity of the input speech. Finally mouth image sequences of corresponding viseme segments in the selected diviseme instances are time warped and blended to construct the mouth images of the final animation. Visual speech synthesis experiments and subjective evaluation results show that mouth animations can be obtained which are not only realistic with clear and smooth mouth images, but also in good accordance with the acoustic pronunciation and intensity of the input speech.

Original languageEnglish
Title of host publicationProceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008
Pages906-909
Number of pages4
DOIs
StatePublished - 2008
Event2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008 - Cairns, QLD, Australia
Duration: 8 Oct 200810 Oct 2008

Publication series

NameProceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008

Conference

Conference2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008
Country/TerritoryAustralia
CityCairns, QLD
Period8/10/0810/10/08

Fingerprint

Dive into the research topics of 'Accurate visual speech synthesis based on diviseme unit selection and concatenation'. Together they form a unique fingerprint.

Cite this