Context dependent viseme models for voice driven animation

Xie Lei; Jiang Dongmei; I. Ravyse; W. Verhelst; H. Sahli; V. Slavova; Z. Rongchun

doi:10.1109/VIPMC.2003.1220537

Context dependent viseme models for voice driven animation

Xie Lei, Jiang Dongmei, I. Ravyse, W. Verhelst, H. Sahli, V. Slavova, Z. Rongchun

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

7 引用（Scopus）

摘要

This paper addresses the problem of animating a talking figure, such as an avatar, using speech input only. The system that was developed is based on hidden Markov models for the acoustic observation vectors of the speech sounds that correspond to each of 16 visually distinct mouth shapes (visemes). The acoustic variability with context was taken into account by building acoustic viseme models that are dependent on the left and right viseme contexts. Our experimental results show that it is indeed possible to obtain visually relevant speech segmentation data directly from the purely acoustic speech signal.

源语言	英语
主期刊名	Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications
编辑	Sonja Grgic, Mislav Grgic
出版商	Institute of Electrical and Electronics Engineers Inc.
页	649-654
页数	6
ISBN（电子版）	9531840547, 9789531840545
DOI	https://doi.org/10.1109/VIPMC.2003.1220537
出版状态	已出版 - 2003
活动	4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003 - Zagreb, 克罗地亚期限: 2 7月 2003 → 5 7月 2003

出版系列

姓名	Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications
卷	2

会议

会议	4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003
国家/地区	克罗地亚
市	Zagreb
时期	2/07/03 → 5/07/03

访问文件

10.1109/VIPMC.2003.1220537

其它文件与链接

链接到 Scopus 的出版物

引用此

Lei, X., Dongmei, J., Ravyse, I., Verhelst, W., Sahli, H., Slavova, V., & Rongchun, Z. (2003). Context dependent viseme models for voice driven animation. 在 S. Grgic, & M. Grgic (编辑), Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications (页码 649-654). 文章 1220537 (Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications; 卷 2). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/VIPMC.2003.1220537

Lei, Xie ; Dongmei, Jiang ; Ravyse, I. 等. / Context dependent viseme models for voice driven animation. Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications. 编辑 / Sonja Grgic ; Mislav Grgic. Institute of Electrical and Electronics Engineers Inc., 2003. 页码 649-654 (Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications).

@inproceedings{8e92a2ea7c3f4bb499984f630d5e745a,

title = "Context dependent viseme models for voice driven animation",

abstract = "This paper addresses the problem of animating a talking figure, such as an avatar, using speech input only. The system that was developed is based on hidden Markov models for the acoustic observation vectors of the speech sounds that correspond to each of 16 visually distinct mouth shapes (visemes). The acoustic variability with context was taken into account by building acoustic viseme models that are dependent on the left and right viseme contexts. Our experimental results show that it is indeed possible to obtain visually relevant speech segmentation data directly from the purely acoustic speech signal.",

keywords = "Animation, Automatic speech recognition, Avatars, Context modeling, Hidden Markov models, Mouth, Robustness, Shape, Speech processing, Speech recognition",

author = "Xie Lei and Jiang Dongmei and I. Ravyse and W. Verhelst and H. Sahli and V. Slavova and Z. Rongchun",

note = "Publisher Copyright: {\textcopyright} 2003 Faculty of Electrical Engineering and Co.; 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003 ; Conference date: 02-07-2003 Through 05-07-2003",

year = "2003",

doi = "10.1109/VIPMC.2003.1220537",

language = "英语",

series = "Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "649--654",

editor = "Sonja Grgic and Mislav Grgic",

booktitle = "Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications",

}

Lei, X, Dongmei, J, Ravyse, I, Verhelst, W, Sahli, H, Slavova, V & Rongchun, Z 2003, Context dependent viseme models for voice driven animation. 在 S Grgic & M Grgic (编辑), Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications., 1220537, Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, 卷 2, Institute of Electrical and Electronics Engineers Inc., 页码 649-654, 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003, Zagreb, 克罗地亚, 2/07/03. https://doi.org/10.1109/VIPMC.2003.1220537

Context dependent viseme models for voice driven animation. / Lei, Xie; Dongmei, Jiang; Ravyse, I. 等.
Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications. 编辑 / Sonja Grgic; Mislav Grgic. Institute of Electrical and Electronics Engineers Inc., 2003. 页码 649-654 1220537 (Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications; 卷 2).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Context dependent viseme models for voice driven animation

AU - Lei, Xie

AU - Dongmei, Jiang

AU - Ravyse, I.

AU - Verhelst, W.

AU - Sahli, H.

AU - Slavova, V.

AU - Rongchun, Z.

PY - 2003

Y1 - 2003

N2 - This paper addresses the problem of animating a talking figure, such as an avatar, using speech input only. The system that was developed is based on hidden Markov models for the acoustic observation vectors of the speech sounds that correspond to each of 16 visually distinct mouth shapes (visemes). The acoustic variability with context was taken into account by building acoustic viseme models that are dependent on the left and right viseme contexts. Our experimental results show that it is indeed possible to obtain visually relevant speech segmentation data directly from the purely acoustic speech signal.

AB - This paper addresses the problem of animating a talking figure, such as an avatar, using speech input only. The system that was developed is based on hidden Markov models for the acoustic observation vectors of the speech sounds that correspond to each of 16 visually distinct mouth shapes (visemes). The acoustic variability with context was taken into account by building acoustic viseme models that are dependent on the left and right viseme contexts. Our experimental results show that it is indeed possible to obtain visually relevant speech segmentation data directly from the purely acoustic speech signal.

KW - Animation

KW - Automatic speech recognition

KW - Avatars

KW - Context modeling

KW - Hidden Markov models

KW - Mouth

KW - Robustness

KW - Shape

KW - Speech processing

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=33845277490&partnerID=8YFLogxK

U2 - 10.1109/VIPMC.2003.1220537

DO - 10.1109/VIPMC.2003.1220537

M3 - 会议稿件

AN - SCOPUS:33845277490

T3 - Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications

SP - 649

EP - 654

BT - Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications

A2 - Grgic, Sonja

A2 - Grgic, Mislav

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003

Y2 - 2 July 2003 through 5 July 2003

ER -

Lei X, Dongmei J, Ravyse I, Verhelst W, Sahli H, Slavova V 等. Context dependent viseme models for voice driven animation. 在 Grgic S, Grgic M, 编辑, Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications. Institute of Electrical and Electronics Engineers Inc. 2003. 页码 649-654. 1220537. (Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications). doi: 10.1109/VIPMC.2003.1220537

Context dependent viseme models for voice driven animation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此