Triseme decision trees in the continuous speech recognition system for a talking head

Dong Mei Jiang; Lei Xie; Ilse Ravyse; Rong Chun Zhao; Hichem Sahli; Jan Cornelis

Triseme decision trees in the continuous speech recognition system for a talking head

Dong Mei Jiang, Lei Xie, Ilse Ravyse, Rong Chun Zhao, Hichem Sahli, Jan Cornelis

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

3 引用（Scopus）

摘要

In this paper, we present a viseme (the basic speech units in the visual domain) based continuous speech recognition system, which segments speech into viseme sequences with timing boundaries to drive a talking head. In the viseme Hidden Markov Model (HMM) training, the instances of a viseme with different contexts are formulated as trisemes. Based on the mouth shape parameters Liprounding and the defined viseme similarity weight (VSW) from the 3D viseme facial models, 166 questions concerning the viseme contexts are designed to build triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. To evaluate the system performance, the image related measurements are also taken to evaluate the resulting viseme sequences, with 'jerky instances' in Liprounding and VSW graphs evaluating their smoothness. Results show that compared to the phoneme based system, the tied-state triseme based speech recognition system gives talking head animation with smoother and more plausible mouth shapes.

源语言	英语
主期刊名	Proceedings of 2002 International Conference on Machine Learning and Cybernetics
页	2097-2101
页数	5
出版状态	已出版 - 2002
活动	Proceedings of 2002 International Conference on Machine Learning and Cybernetics - Beijing, 中国期限: 4 11月 2002 → 5 11月 2002

出版系列

姓名	Proceedings of 2002 International Conference on Machine Learning and Cybernetics
卷	4

会议

会议	Proceedings of 2002 International Conference on Machine Learning and Cybernetics
国家/地区	中国
市	Beijing
时期	4/11/02 → 5/11/02

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{59acff0c039042588b5e3f02ac6218f7,

title = "Triseme decision trees in the continuous speech recognition system for a talking head",

abstract = "In this paper, we present a viseme (the basic speech units in the visual domain) based continuous speech recognition system, which segments speech into viseme sequences with timing boundaries to drive a talking head. In the viseme Hidden Markov Model (HMM) training, the instances of a viseme with different contexts are formulated as trisemes. Based on the mouth shape parameters Liprounding and the defined viseme similarity weight (VSW) from the 3D viseme facial models, 166 questions concerning the viseme contexts are designed to build triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. To evaluate the system performance, the image related measurements are also taken to evaluate the resulting viseme sequences, with 'jerky instances' in Liprounding and VSW graphs evaluating their smoothness. Results show that compared to the phoneme based system, the tied-state triseme based speech recognition system gives talking head animation with smoother and more plausible mouth shapes.",

keywords = "Jerky instances, Liprounding, Triseme decision tree, Viseme, Viseme similarity weight",

author = "Jiang, {Dong Mei} and Lei Xie and Ilse Ravyse and Zhao, {Rong Chun} and Hichem Sahli and Jan Cornelis",

year = "2002",

language = "英语",

isbn = "0780375084",

series = "Proceedings of 2002 International Conference on Machine Learning and Cybernetics",

pages = "2097--2101",

booktitle = "Proceedings of 2002 International Conference on Machine Learning and Cybernetics",

note = "Proceedings of 2002 International Conference on Machine Learning and Cybernetics ; Conference date: 04-11-2002 Through 05-11-2002",

}

Jiang, DM, Xie, L, Ravyse, I, Zhao, RC, Sahli, H & Cornelis, J 2002, Triseme decision trees in the continuous speech recognition system for a talking head. 在 Proceedings of 2002 International Conference on Machine Learning and Cybernetics. Proceedings of 2002 International Conference on Machine Learning and Cybernetics, 卷 4, 页码 2097-2101, Proceedings of 2002 International Conference on Machine Learning and Cybernetics, Beijing, 中国, 4/11/02.

Triseme decision trees in the continuous speech recognition system for a talking head. / Jiang, Dong Mei; Xie, Lei; Ravyse, Ilse 等.
Proceedings of 2002 International Conference on Machine Learning and Cybernetics. 2002. 页码 2097-2101 (Proceedings of 2002 International Conference on Machine Learning and Cybernetics; 卷 4).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Triseme decision trees in the continuous speech recognition system for a talking head

AU - Jiang, Dong Mei

AU - Xie, Lei

AU - Ravyse, Ilse

AU - Zhao, Rong Chun

AU - Sahli, Hichem

AU - Cornelis, Jan

PY - 2002

Y1 - 2002

N2 - In this paper, we present a viseme (the basic speech units in the visual domain) based continuous speech recognition system, which segments speech into viseme sequences with timing boundaries to drive a talking head. In the viseme Hidden Markov Model (HMM) training, the instances of a viseme with different contexts are formulated as trisemes. Based on the mouth shape parameters Liprounding and the defined viseme similarity weight (VSW) from the 3D viseme facial models, 166 questions concerning the viseme contexts are designed to build triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. To evaluate the system performance, the image related measurements are also taken to evaluate the resulting viseme sequences, with 'jerky instances' in Liprounding and VSW graphs evaluating their smoothness. Results show that compared to the phoneme based system, the tied-state triseme based speech recognition system gives talking head animation with smoother and more plausible mouth shapes.

AB - In this paper, we present a viseme (the basic speech units in the visual domain) based continuous speech recognition system, which segments speech into viseme sequences with timing boundaries to drive a talking head. In the viseme Hidden Markov Model (HMM) training, the instances of a viseme with different contexts are formulated as trisemes. Based on the mouth shape parameters Liprounding and the defined viseme similarity weight (VSW) from the 3D viseme facial models, 166 questions concerning the viseme contexts are designed to build triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. To evaluate the system performance, the image related measurements are also taken to evaluate the resulting viseme sequences, with 'jerky instances' in Liprounding and VSW graphs evaluating their smoothness. Results show that compared to the phoneme based system, the tied-state triseme based speech recognition system gives talking head animation with smoother and more plausible mouth shapes.

KW - Jerky instances

KW - Liprounding

KW - Triseme decision tree

KW - Viseme

KW - Viseme similarity weight

UR - http://www.scopus.com/inward/record.url?scp=0036921857&partnerID=8YFLogxK

M3 - 会议稿件

AN - SCOPUS:0036921857

SN - 0780375084

T3 - Proceedings of 2002 International Conference on Machine Learning and Cybernetics

SP - 2097

EP - 2101

BT - Proceedings of 2002 International Conference on Machine Learning and Cybernetics

T2 - Proceedings of 2002 International Conference on Machine Learning and Cybernetics

Y2 - 4 November 2002 through 5 November 2002

ER -

Triseme decision trees in the continuous speech recognition system for a talking head

摘要

出版系列

会议

其它文件与链接

指纹

引用此