TY - GEN
T1 - A comparative study of audio features for audio-to-visual conversion in MPEG-4 compliant facial animation
AU - Xie, Lei
AU - Liu, Zhi Qiang
PY - 2006
Y1 - 2006
N2 - Audio-to-visual conversion is the basic problem of speech-driven facial animation. Since the conversion problem is to predict facial control parameters from the acoustic speech, the informative representation of audio, i.e., the audio feature, is important to get a good prediction. This paper presents a performance comparison on prosodic features, articulatory features, and perceptual features for the audio-to-visual conversion problem on a common test bed. Experimental results show that the Mel frequency cepstral coefficients (MFCCs) produce the best performance, followed by the perceptual linear prediction coefficients (PLPC), the linear predictive cepstral coefficients (LPCCs), and the prosodie feature set (F0) and energy). The combination of three kinds of features can further improve the prediction performance on facial parameters. It unveils that different audio features carry complementary information relevant to facial animation.
AB - Audio-to-visual conversion is the basic problem of speech-driven facial animation. Since the conversion problem is to predict facial control parameters from the acoustic speech, the informative representation of audio, i.e., the audio feature, is important to get a good prediction. This paper presents a performance comparison on prosodic features, articulatory features, and perceptual features for the audio-to-visual conversion problem on a common test bed. Experimental results show that the Mel frequency cepstral coefficients (MFCCs) produce the best performance, followed by the perceptual linear prediction coefficients (PLPC), the linear predictive cepstral coefficients (LPCCs), and the prosodie feature set (F0) and energy). The combination of three kinds of features can further improve the prediction performance on facial parameters. It unveils that different audio features carry complementary information relevant to facial animation.
KW - Audio features
KW - Audio-to-visual conversion
KW - Facial animation
KW - MPEG-4
KW - Talking face
UR - http://www.scopus.com/inward/record.url?scp=33947224137&partnerID=8YFLogxK
U2 - 10.1109/ICMLC.2006.259085
DO - 10.1109/ICMLC.2006.259085
M3 - 会议稿件
AN - SCOPUS:33947224137
SN - 1424400619
SN - 9781424400614
T3 - Proceedings of the 2006 International Conference on Machine Learning and Cybernetics
SP - 4359
EP - 4364
BT - Proceedings of the 2006 International Conference on Machine Learning and Cybernetics
T2 - 2006 International Conference on Machine Learning and Cybernetics
Y2 - 13 August 2006 through 16 August 2006
ER -