The viseme based continuous speech recognition system for a talking head

Dong Mei Jiang; Lei Xie; Ilse Ravyse; Rong Chun Zhao; Hichem Sahli; Jan Cornelis

The viseme based continuous speech recognition system for a talking head

Dong Mei Jiang, Lei Xie, Ilse Ravyse, Rong Chun Zhao, Hichem Sahli, Jan Cornelis

School of Computer Science

Research output: Contribution to journal › Article › peer-review

Abstract

A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are formalized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is defined, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the 'viseme similarity weighted accuracy' accounts for the mismatches of the recognized viseme sequence with its reference, and 'jerky points' in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.

Original language	English
Pages (from-to)	375-381
Number of pages	7
Journal	Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology
Volume	26
Issue number	3
State	Published - Mar 2004

Keywords

Liprounding and VSW graphs
Talking head
Triseme decision trees
Viseme
Viseme similarity weighted accuracy

Cite this

@article{5faec018f1ca4374afad4fd2117fc53b,

title = "The viseme based continuous speech recognition system for a talking head",

abstract = "A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are formalized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is defined, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the 'viseme similarity weighted accuracy' accounts for the mismatches of the recognized viseme sequence with its reference, and 'jerky points' in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.",

keywords = "Liprounding and VSW graphs, Talking head, Triseme decision trees, Viseme, Viseme similarity weighted accuracy",

author = "Jiang, {Dong Mei} and Lei Xie and Ilse Ravyse and Zhao, {Rong Chun} and Hichem Sahli and Jan Cornelis",

year = "2004",

month = mar,

language = "英语",

volume = "26",

pages = "375--381",

journal = "Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology",

issn = "1009-5896",

publisher = "Science Press ",

number = "3",

}

TY - JOUR

T1 - The viseme based continuous speech recognition system for a talking head

AU - Jiang, Dong Mei

AU - Xie, Lei

AU - Ravyse, Ilse

AU - Zhao, Rong Chun

AU - Sahli, Hichem

AU - Cornelis, Jan

PY - 2004/3

Y1 - 2004/3

N2 - A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are formalized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is defined, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the 'viseme similarity weighted accuracy' accounts for the mismatches of the recognized viseme sequence with its reference, and 'jerky points' in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.

AB - A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are formalized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is defined, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the 'viseme similarity weighted accuracy' accounts for the mismatches of the recognized viseme sequence with its reference, and 'jerky points' in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.

KW - Liprounding and VSW graphs

KW - Talking head

KW - Triseme decision trees

KW - Viseme

KW - Viseme similarity weighted accuracy

UR - http://www.scopus.com/inward/record.url?scp=3042732279&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:3042732279

SN - 1009-5896

VL - 26

SP - 375

EP - 381

JO - Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology

JF - Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology

IS - 3

ER -

The viseme based continuous speech recognition system for a talking head

Abstract

Keywords

Other files and links

Fingerprint

Cite this