An articulatory approach to video-realistic mouth animation

Lei Xie; Zhi Qiang Liu

An articulatory approach to video-realistic mouth animation

Lei Xie, Zhi Qiang Liu

City University of Hong Kong

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

4 引用（Scopus）

摘要

We propose an articulatory approach which is capable of converting speaker independent continuous speech into video-realistic mouth animation. We directly model the motions of articulators, such as lips, tongue, and teeth, using a Dynamic Bayesian Network (DBN)-structured articulatory model (AM). We also present an EM-based conversion algorithm to convert audio to animation parameters by maximizing the likelihood of these parameters given the input audio and the AMs. We further extend the AMs with introduction of speech context information, resulting in context dependent articulatory models (CD-AMs). Objective evaluations on the JEWEL testing set show that the animation parameters estimated by the proposed AMs and CD-AMs can follow the real parameters more accurately than that of phoneme-based models (PMs) and their context dependent counterparts (CD-PMs). Subjective evaluations on an AV subjective testing set, which collects various AV contents from the Internet, also demonstrate that the AMs and CD-AMs are able to generate more natural and realistic mouth animations and the CD-AMs achieve the best performance.

源语言	英语
主期刊名	2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
页	I593-I596
出版状态	已出版 - 2006
已对外发布	是
活动	2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 - Toulouse, 法国期限: 14 5月 2006 → 19 5月 2006

出版系列

姓名	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
卷	1
ISSN（印刷版）	1520-6149

会议

会议	2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
国家/地区	法国
市	Toulouse
时期	14/05/06 → 19/05/06

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{da11ea47cabf42729bf77b8fefc558a6,

title = "An articulatory approach to video-realistic mouth animation",

abstract = "We propose an articulatory approach which is capable of converting speaker independent continuous speech into video-realistic mouth animation. We directly model the motions of articulators, such as lips, tongue, and teeth, using a Dynamic Bayesian Network (DBN)-structured articulatory model (AM). We also present an EM-based conversion algorithm to convert audio to animation parameters by maximizing the likelihood of these parameters given the input audio and the AMs. We further extend the AMs with introduction of speech context information, resulting in context dependent articulatory models (CD-AMs). Objective evaluations on the JEWEL testing set show that the animation parameters estimated by the proposed AMs and CD-AMs can follow the real parameters more accurately than that of phoneme-based models (PMs) and their context dependent counterparts (CD-PMs). Subjective evaluations on an AV subjective testing set, which collects various AV contents from the Internet, also demonstrate that the AMs and CD-AMs are able to generate more natural and realistic mouth animations and the CD-AMs achieve the best performance.",

author = "Lei Xie and Liu, {Zhi Qiang}",

year = "2006",

language = "英语",

isbn = "142440469X",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

pages = "I593--I596",

booktitle = "2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings",

note = "2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 ; Conference date: 14-05-2006 Through 19-05-2006",

}

Xie, L & Liu, ZQ 2006, An articulatory approach to video-realistic mouth animation. 在 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings., 1660090, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 卷 1, 页码 I593-I596, 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, Toulouse, 法国, 14/05/06.

An articulatory approach to video-realistic mouth animation. / Xie, Lei; Liu, Zhi Qiang.
2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings. 2006. 页码 I593-I596 1660090 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 1).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - An articulatory approach to video-realistic mouth animation

AU - Xie, Lei

AU - Liu, Zhi Qiang

PY - 2006

Y1 - 2006

N2 - We propose an articulatory approach which is capable of converting speaker independent continuous speech into video-realistic mouth animation. We directly model the motions of articulators, such as lips, tongue, and teeth, using a Dynamic Bayesian Network (DBN)-structured articulatory model (AM). We also present an EM-based conversion algorithm to convert audio to animation parameters by maximizing the likelihood of these parameters given the input audio and the AMs. We further extend the AMs with introduction of speech context information, resulting in context dependent articulatory models (CD-AMs). Objective evaluations on the JEWEL testing set show that the animation parameters estimated by the proposed AMs and CD-AMs can follow the real parameters more accurately than that of phoneme-based models (PMs) and their context dependent counterparts (CD-PMs). Subjective evaluations on an AV subjective testing set, which collects various AV contents from the Internet, also demonstrate that the AMs and CD-AMs are able to generate more natural and realistic mouth animations and the CD-AMs achieve the best performance.

AB - We propose an articulatory approach which is capable of converting speaker independent continuous speech into video-realistic mouth animation. We directly model the motions of articulators, such as lips, tongue, and teeth, using a Dynamic Bayesian Network (DBN)-structured articulatory model (AM). We also present an EM-based conversion algorithm to convert audio to animation parameters by maximizing the likelihood of these parameters given the input audio and the AMs. We further extend the AMs with introduction of speech context information, resulting in context dependent articulatory models (CD-AMs). Objective evaluations on the JEWEL testing set show that the animation parameters estimated by the proposed AMs and CD-AMs can follow the real parameters more accurately than that of phoneme-based models (PMs) and their context dependent counterparts (CD-PMs). Subjective evaluations on an AV subjective testing set, which collects various AV contents from the Internet, also demonstrate that the AMs and CD-AMs are able to generate more natural and realistic mouth animations and the CD-AMs achieve the best performance.

UR - http://www.scopus.com/inward/record.url?scp=33947648491&partnerID=8YFLogxK

M3 - 会议稿件

AN - SCOPUS:33947648491

SN - 142440469X

SN - 9781424404698

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - I593-I596

BT - 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings

T2 - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006

Y2 - 14 May 2006 through 19 May 2006

ER -

An articulatory approach to video-realistic mouth animation

摘要

出版系列

会议

其它文件与链接

指纹

引用此