The viseme based continuous speech recognition system for a talking head

Dong Mei Jiang, Lei Xie, Ilse Ravyse, Rong Chun Zhao, Hichem Sahli, Jan Cornelis

Research output: Contribution to journalArticlepeer-review

Abstract

A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are formalized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is defined, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the 'viseme similarity weighted accuracy' accounts for the mismatches of the recognized viseme sequence with its reference, and 'jerky points' in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.

Original languageEnglish
Pages (from-to)375-381
Number of pages7
JournalDianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology
Volume26
Issue number3
StatePublished - Mar 2004

Keywords

  • Liprounding and VSW graphs
  • Talking head
  • Triseme decision trees
  • Viseme
  • Viseme similarity weighted accuracy

Fingerprint

Dive into the research topics of 'The viseme based continuous speech recognition system for a talking head'. Together they form a unique fingerprint.

Cite this