Context dependent viseme models for voice driven animation

Xie Lei; Jiang Dongmei; I. Ravyse; W. Verhelst; H. Sahli; V. Slavova; Z. Rongchun

doi:10.1109/VIPMC.2003.1220537

Context dependent viseme models for voice driven animation

Xie Lei, Jiang Dongmei, I. Ravyse, W. Verhelst, H. Sahli, V. Slavova, Z. Rongchun

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

7 Scopus citations

Abstract

This paper addresses the problem of animating a talking figure, such as an avatar, using speech input only. The system that was developed is based on hidden Markov models for the acoustic observation vectors of the speech sounds that correspond to each of 16 visually distinct mouth shapes (visemes). The acoustic variability with context was taken into account by building acoustic viseme models that are dependent on the left and right viseme contexts. Our experimental results show that it is indeed possible to obtain visually relevant speech segmentation data directly from the purely acoustic speech signal.

Original language	English
Title of host publication	Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications
Editors	Sonja Grgic, Mislav Grgic
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	649-654
Number of pages	6
ISBN (Electronic)	9531840547, 9789531840545
DOIs	https://doi.org/10.1109/VIPMC.2003.1220537
State	Published - 2003
Event	4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003 - Zagreb, Croatia Duration: 2 Jul 2003 → 5 Jul 2003

Publication series

Name	Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications
Volume	2

Conference

Conference	4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003
Country/Territory	Croatia
City	Zagreb
Period	2/07/03 → 5/07/03

Keywords

Animation
Automatic speech recognition
Avatars
Context modeling
Hidden Markov models
Mouth
Robustness
Shape
Speech processing
Speech recognition

Access to Document

10.1109/VIPMC.2003.1220537

Cite this

Lei, X., Dongmei, J., Ravyse, I., Verhelst, W., Sahli, H., Slavova, V., & Rongchun, Z. (2003). Context dependent viseme models for voice driven animation. In S. Grgic, & M. Grgic (Eds.), Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications (pp. 649-654). Article 1220537 (Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications; Vol. 2). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/VIPMC.2003.1220537

Lei, Xie ; Dongmei, Jiang ; Ravyse, I. et al. / Context dependent viseme models for voice driven animation. Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications. editor / Sonja Grgic ; Mislav Grgic. Institute of Electrical and Electronics Engineers Inc., 2003. pp. 649-654 (Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications).

@inproceedings{8e92a2ea7c3f4bb499984f630d5e745a,

title = "Context dependent viseme models for voice driven animation",

abstract = "This paper addresses the problem of animating a talking figure, such as an avatar, using speech input only. The system that was developed is based on hidden Markov models for the acoustic observation vectors of the speech sounds that correspond to each of 16 visually distinct mouth shapes (visemes). The acoustic variability with context was taken into account by building acoustic viseme models that are dependent on the left and right viseme contexts. Our experimental results show that it is indeed possible to obtain visually relevant speech segmentation data directly from the purely acoustic speech signal.",

keywords = "Animation, Automatic speech recognition, Avatars, Context modeling, Hidden Markov models, Mouth, Robustness, Shape, Speech processing, Speech recognition",

author = "Xie Lei and Jiang Dongmei and I. Ravyse and W. Verhelst and H. Sahli and V. Slavova and Z. Rongchun",

note = "Publisher Copyright: {\textcopyright} 2003 Faculty of Electrical Engineering and Co.; 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003 ; Conference date: 02-07-2003 Through 05-07-2003",

year = "2003",

doi = "10.1109/VIPMC.2003.1220537",

language = "英语",

series = "Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "649--654",

editor = "Sonja Grgic and Mislav Grgic",

booktitle = "Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications",

}

Lei, X, Dongmei, J, Ravyse, I, Verhelst, W, Sahli, H, Slavova, V & Rongchun, Z 2003, Context dependent viseme models for voice driven animation. in S Grgic & M Grgic (eds), Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications., 1220537, Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, vol. 2, Institute of Electrical and Electronics Engineers Inc., pp. 649-654, 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003, Zagreb, Croatia, 2/07/03. https://doi.org/10.1109/VIPMC.2003.1220537

Context dependent viseme models for voice driven animation. / Lei, Xie; Dongmei, Jiang; Ravyse, I. et al.
Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications. ed. / Sonja Grgic; Mislav Grgic. Institute of Electrical and Electronics Engineers Inc., 2003. p. 649-654 1220537 (Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications; Vol. 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Context dependent viseme models for voice driven animation

AU - Lei, Xie

AU - Dongmei, Jiang

AU - Ravyse, I.

AU - Verhelst, W.

AU - Sahli, H.

AU - Slavova, V.

AU - Rongchun, Z.

PY - 2003

Y1 - 2003

N2 - This paper addresses the problem of animating a talking figure, such as an avatar, using speech input only. The system that was developed is based on hidden Markov models for the acoustic observation vectors of the speech sounds that correspond to each of 16 visually distinct mouth shapes (visemes). The acoustic variability with context was taken into account by building acoustic viseme models that are dependent on the left and right viseme contexts. Our experimental results show that it is indeed possible to obtain visually relevant speech segmentation data directly from the purely acoustic speech signal.

AB - This paper addresses the problem of animating a talking figure, such as an avatar, using speech input only. The system that was developed is based on hidden Markov models for the acoustic observation vectors of the speech sounds that correspond to each of 16 visually distinct mouth shapes (visemes). The acoustic variability with context was taken into account by building acoustic viseme models that are dependent on the left and right viseme contexts. Our experimental results show that it is indeed possible to obtain visually relevant speech segmentation data directly from the purely acoustic speech signal.

KW - Animation

KW - Automatic speech recognition

KW - Avatars

KW - Context modeling

KW - Hidden Markov models

KW - Mouth

KW - Robustness

KW - Shape

KW - Speech processing

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=33845277490&partnerID=8YFLogxK

U2 - 10.1109/VIPMC.2003.1220537

DO - 10.1109/VIPMC.2003.1220537

M3 - 会议稿件

AN - SCOPUS:33845277490

T3 - Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications

SP - 649

EP - 654

BT - Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications

A2 - Grgic, Sonja

A2 - Grgic, Mislav

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications, EC-VIP-MC 2003

Y2 - 2 July 2003 through 5 July 2003

ER -

Lei X, Dongmei J, Ravyse I, Verhelst W, Sahli H, Slavova V et al. Context dependent viseme models for voice driven animation. In Grgic S, Grgic M, editors, Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications. Institute of Electrical and Electronics Engineers Inc. 2003. p. 649-654. 1220537. (Proceedings EC-VIP-MC 2003 - 4th EURASIP Conference Focused on Video / Image Processing and Multimedia Communications). doi: 10.1109/VIPMC.2003.1220537

Context dependent viseme models for voice driven animation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this