Speech animation using coupled hidden Markov models

Lei Xie; Zhi Qiang Liu

doi:10.1109/ICPR.2006.1074

Speech animation using coupled hidden Markov models

Lei Xie, Zhi Qiang Liu

City University of Hong Kong

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

14 引用（Scopus）

摘要

We present a novel speech animation approach using coupled hidden Markov models (CHMMs). Different from the conventional HMMs that use a single state chain to model the audio-visual speech with tight inter-modal synchronization, we use the CHMMs to model the asynchrony, different discriminative abilities, and temporal coupling between the audio speech and the visual speech, which are important factors for animations looking natural. Based on the audio-visual CHMMs, visual animation parameters are predicted from audio through an EM-based audio to visual conversion algorithm. Experiments on the JEWEL AV database show that compared with the conventional HMMs, the CHMMs can output visual parameters that are much closer to the actual ones. Explicit modelling of audio-visual speech is promising in speech animation.

源语言	英语
主期刊名	Proceedings - 18th International Conference on Pattern Recognition, ICPR 2006
页	1128-1131
页数	4
DOI	https://doi.org/10.1109/ICPR.2006.1074
出版状态	已出版 - 2006
已对外发布	是
活动	18th International Conference on Pattern Recognition, ICPR 2006 - Hong Kong, 中国期限: 20 8月 2006 → 24 8月 2006

出版系列

姓名	Proceedings - International Conference on Pattern Recognition
卷	1
ISSN（印刷版）	1051-4651

会议

会议	18th International Conference on Pattern Recognition, ICPR 2006
国家/地区	中国
市	Hong Kong
时期	20/08/06 → 24/08/06

访问文件

10.1109/ICPR.2006.1074

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{6c4a0ff079d24d6c878d0aa4c2e87553,

title = "Speech animation using coupled hidden Markov models",

abstract = "We present a novel speech animation approach using coupled hidden Markov models (CHMMs). Different from the conventional HMMs that use a single state chain to model the audio-visual speech with tight inter-modal synchronization, we use the CHMMs to model the asynchrony, different discriminative abilities, and temporal coupling between the audio speech and the visual speech, which are important factors for animations looking natural. Based on the audio-visual CHMMs, visual animation parameters are predicted from audio through an EM-based audio to visual conversion algorithm. Experiments on the JEWEL AV database show that compared with the conventional HMMs, the CHMMs can output visual parameters that are much closer to the actual ones. Explicit modelling of audio-visual speech is promising in speech animation.",

author = "Lei Xie and Liu, {Zhi Qiang}",

year = "2006",

doi = "10.1109/ICPR.2006.1074",

language = "英语",

isbn = "0769525210",

series = "Proceedings - International Conference on Pattern Recognition",

pages = "1128--1131",

booktitle = "Proceedings - 18th International Conference on Pattern Recognition, ICPR 2006",

note = "18th International Conference on Pattern Recognition, ICPR 2006 ; Conference date: 20-08-2006 Through 24-08-2006",

}

Xie, L & Liu, ZQ 2006, Speech animation using coupled hidden Markov models. 在 Proceedings - 18th International Conference on Pattern Recognition, ICPR 2006., 1699088, Proceedings - International Conference on Pattern Recognition, 卷 1, 页码 1128-1131, 18th International Conference on Pattern Recognition, ICPR 2006, Hong Kong, 中国, 20/08/06. https://doi.org/10.1109/ICPR.2006.1074

TY - GEN

T1 - Speech animation using coupled hidden Markov models

AU - Xie, Lei

AU - Liu, Zhi Qiang

PY - 2006

Y1 - 2006

N2 - We present a novel speech animation approach using coupled hidden Markov models (CHMMs). Different from the conventional HMMs that use a single state chain to model the audio-visual speech with tight inter-modal synchronization, we use the CHMMs to model the asynchrony, different discriminative abilities, and temporal coupling between the audio speech and the visual speech, which are important factors for animations looking natural. Based on the audio-visual CHMMs, visual animation parameters are predicted from audio through an EM-based audio to visual conversion algorithm. Experiments on the JEWEL AV database show that compared with the conventional HMMs, the CHMMs can output visual parameters that are much closer to the actual ones. Explicit modelling of audio-visual speech is promising in speech animation.

AB - We present a novel speech animation approach using coupled hidden Markov models (CHMMs). Different from the conventional HMMs that use a single state chain to model the audio-visual speech with tight inter-modal synchronization, we use the CHMMs to model the asynchrony, different discriminative abilities, and temporal coupling between the audio speech and the visual speech, which are important factors for animations looking natural. Based on the audio-visual CHMMs, visual animation parameters are predicted from audio through an EM-based audio to visual conversion algorithm. Experiments on the JEWEL AV database show that compared with the conventional HMMs, the CHMMs can output visual parameters that are much closer to the actual ones. Explicit modelling of audio-visual speech is promising in speech animation.

UR - http://www.scopus.com/inward/record.url?scp=34047240820&partnerID=8YFLogxK

U2 - 10.1109/ICPR.2006.1074

DO - 10.1109/ICPR.2006.1074

M3 - 会议稿件

AN - SCOPUS:34047240820

SN - 0769525210

SN - 9780769525211

T3 - Proceedings - International Conference on Pattern Recognition

SP - 1128

EP - 1131

BT - Proceedings - 18th International Conference on Pattern Recognition, ICPR 2006

T2 - 18th International Conference on Pattern Recognition, ICPR 2006

Y2 - 20 August 2006 through 24 August 2006

ER -