TY - JOUR
T1 - The viseme based continuous speech recognition system for a talking head
AU - Jiang, Dong Mei
AU - Xie, Lei
AU - Ravyse, Ilse
AU - Zhao, Rong Chun
AU - Sahli, Hichem
AU - Cornelis, Jan
PY - 2004/3
Y1 - 2004/3
N2 - A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are formalized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is defined, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the 'viseme similarity weighted accuracy' accounts for the mismatches of the recognized viseme sequence with its reference, and 'jerky points' in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.
AB - A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are formalized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is defined, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the 'viseme similarity weighted accuracy' accounts for the mismatches of the recognized viseme sequence with its reference, and 'jerky points' in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.
KW - Liprounding and VSW graphs
KW - Talking head
KW - Triseme decision trees
KW - Viseme
KW - Viseme similarity weighted accuracy
UR - http://www.scopus.com/inward/record.url?scp=3042732279&partnerID=8YFLogxK
M3 - 文章
AN - SCOPUS:3042732279
SN - 1009-5896
VL - 26
SP - 375
EP - 381
JO - Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology
JF - Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology
IS - 3
ER -