A Cantonese speech-driven talking face using translingual audio-to-visual conversion

Lei Xie; Helen Meng; Zhi Qiang Liu

doi:10.1007/11939993_64

A Cantonese speech-driven talking face using translingual audio-to-visual conversion

Lei Xie, Helen Meng, Zhi Qiang Liu

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.

源语言	英语
主期刊名	Chinese Spoken Language Processing - 5th International Symposium, ISCSLP 2006, Proceedings
页	627-639
页数	13
DOI	https://doi.org/10.1007/11939993_64
出版状态	已出版 - 2006
已对外发布	是
活动	5th International Symposium on Chinese Spoken Language Processing, ISCSLP 2006 - Singapore, 新加坡期限: 13 12月 2006 → 16 12月 2006

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	4274 LNAI
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	5th International Symposium on Chinese Spoken Language Processing, ISCSLP 2006
国家/地区	新加坡
市	Singapore
时期	13/12/06 → 16/12/06

访问文件

10.1007/11939993_64

其它文件与链接

链接到 Scopus 的出版物

引用此

Xie, L., Meng, H., & Liu, Z. Q. (2006). A Cantonese speech-driven talking face using translingual audio-to-visual conversion. 在 Chinese Spoken Language Processing - 5th International Symposium, ISCSLP 2006, Proceedings (页码 627-639). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 4274 LNAI). https://doi.org/10.1007/11939993_64

Xie, Lei ; Meng, Helen ; Liu, Zhi Qiang. / A Cantonese speech-driven talking face using translingual audio-to-visual conversion. Chinese Spoken Language Processing - 5th International Symposium, ISCSLP 2006, Proceedings. 2006. 页码 627-639 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{3a76a393b78b4da09e9c851f1ad0abea,

title = "A Cantonese speech-driven talking face using translingual audio-to-visual conversion",

abstract = "This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.",

author = "Lei Xie and Helen Meng and Liu, {Zhi Qiang}",

year = "2006",

doi = "10.1007/11939993_64",

language = "英语",

isbn = "3540496653",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "627--639",

booktitle = "Chinese Spoken Language Processing - 5th International Symposium, ISCSLP 2006, Proceedings",

note = "5th International Symposium on Chinese Spoken Language Processing, ISCSLP 2006 ; Conference date: 13-12-2006 Through 16-12-2006",

}

Xie, L, Meng, H & Liu, ZQ 2006, A Cantonese speech-driven talking face using translingual audio-to-visual conversion. 在 Chinese Spoken Language Processing - 5th International Symposium, ISCSLP 2006, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 4274 LNAI, 页码 627-639, 5th International Symposium on Chinese Spoken Language Processing, ISCSLP 2006, Singapore, 新加坡, 13/12/06. https://doi.org/10.1007/11939993_64

A Cantonese speech-driven talking face using translingual audio-to-visual conversion. / Xie, Lei; Meng, Helen; Liu, Zhi Qiang.
Chinese Spoken Language Processing - 5th International Symposium, ISCSLP 2006, Proceedings. 2006. 页码 627-639 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 4274 LNAI).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - A Cantonese speech-driven talking face using translingual audio-to-visual conversion

AU - Xie, Lei

AU - Meng, Helen

AU - Liu, Zhi Qiang

PY - 2006

Y1 - 2006

N2 - This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.

AB - This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.

UR - http://www.scopus.com/inward/record.url?scp=77249170230&partnerID=8YFLogxK

U2 - 10.1007/11939993_64

DO - 10.1007/11939993_64

M3 - 会议稿件

AN - SCOPUS:77249170230

SN - 3540496653

SN - 9783540496656

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 627

EP - 639

BT - Chinese Spoken Language Processing - 5th International Symposium, ISCSLP 2006, Proceedings

T2 - 5th International Symposium on Chinese Spoken Language Processing, ISCSLP 2006

Y2 - 13 December 2006 through 16 December 2006

ER -

Xie L, Meng H, Liu ZQ. A Cantonese speech-driven talking face using translingual audio-to-visual conversion. 在 Chinese Spoken Language Processing - 5th International Symposium, ISCSLP 2006, Proceedings. 2006. 页码 627-639. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/11939993_64

A Cantonese speech-driven talking face using translingual audio-to-visual conversion

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此