A comparative study of audio features for audio-to-visual conversion in MPEG-4 compliant facial animation

Lei Xie, Zhi Qiang Liu

科研成果: 书/报告/会议事项章节会议稿件同行评审

21 引用 (Scopus)

摘要

Audio-to-visual conversion is the basic problem of speech-driven facial animation. Since the conversion problem is to predict facial control parameters from the acoustic speech, the informative representation of audio, i.e., the audio feature, is important to get a good prediction. This paper presents a performance comparison on prosodic features, articulatory features, and perceptual features for the audio-to-visual conversion problem on a common test bed. Experimental results show that the Mel frequency cepstral coefficients (MFCCs) produce the best performance, followed by the perceptual linear prediction coefficients (PLPC), the linear predictive cepstral coefficients (LPCCs), and the prosodie feature set (F0) and energy). The combination of three kinds of features can further improve the prediction performance on facial parameters. It unveils that different audio features carry complementary information relevant to facial animation.

源语言英语
主期刊名Proceedings of the 2006 International Conference on Machine Learning and Cybernetics
4359-4364
页数6
DOI
出版状态已出版 - 2006
已对外发布
活动2006 International Conference on Machine Learning and Cybernetics - Dalian, 中国
期限: 13 8月 200616 8月 2006

出版系列

姓名Proceedings of the 2006 International Conference on Machine Learning and Cybernetics
2006

会议

会议2006 International Conference on Machine Learning and Cybernetics
国家/地区中国
Dalian
时期13/08/0616/08/06

指纹

探究 'A comparative study of audio features for audio-to-visual conversion in MPEG-4 compliant facial animation' 的科研主题。它们共同构成独一无二的指纹。

引用此