TY - GEN
T1 - An articulatory approach to video-realistic mouth animation
AU - Xie, Lei
AU - Liu, Zhi Qiang
PY - 2006
Y1 - 2006
N2 - We propose an articulatory approach which is capable of converting speaker independent continuous speech into video-realistic mouth animation. We directly model the motions of articulators, such as lips, tongue, and teeth, using a Dynamic Bayesian Network (DBN)-structured articulatory model (AM). We also present an EM-based conversion algorithm to convert audio to animation parameters by maximizing the likelihood of these parameters given the input audio and the AMs. We further extend the AMs with introduction of speech context information, resulting in context dependent articulatory models (CD-AMs). Objective evaluations on the JEWEL testing set show that the animation parameters estimated by the proposed AMs and CD-AMs can follow the real parameters more accurately than that of phoneme-based models (PMs) and their context dependent counterparts (CD-PMs). Subjective evaluations on an AV subjective testing set, which collects various AV contents from the Internet, also demonstrate that the AMs and CD-AMs are able to generate more natural and realistic mouth animations and the CD-AMs achieve the best performance.
AB - We propose an articulatory approach which is capable of converting speaker independent continuous speech into video-realistic mouth animation. We directly model the motions of articulators, such as lips, tongue, and teeth, using a Dynamic Bayesian Network (DBN)-structured articulatory model (AM). We also present an EM-based conversion algorithm to convert audio to animation parameters by maximizing the likelihood of these parameters given the input audio and the AMs. We further extend the AMs with introduction of speech context information, resulting in context dependent articulatory models (CD-AMs). Objective evaluations on the JEWEL testing set show that the animation parameters estimated by the proposed AMs and CD-AMs can follow the real parameters more accurately than that of phoneme-based models (PMs) and their context dependent counterparts (CD-PMs). Subjective evaluations on an AV subjective testing set, which collects various AV contents from the Internet, also demonstrate that the AMs and CD-AMs are able to generate more natural and realistic mouth animations and the CD-AMs achieve the best performance.
UR - http://www.scopus.com/inward/record.url?scp=33947648491&partnerID=8YFLogxK
M3 - 会议稿件
AN - SCOPUS:33947648491
SN - 142440469X
SN - 9781424404698
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - I593-I596
BT - 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
T2 - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
Y2 - 14 May 2006 through 19 May 2006
ER -