Lip assistant: Visualize speech for hearing impaired people in multimedia services

Lei Xie; Yi Wang; Zhi Qiang Liu

doi:10.1109/ICSMC.2006.384815

Lip assistant: Visualize speech for hearing impaired people in multimedia services

Lei Xie, Yi Wang, Zhi Qiang Liu

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

4 引用（Scopus）

摘要

This paper presents a very low bit rate speech-to-video synthesizer, named lip assistant, to help hearing impaired people to better access multimedia services via lipreading. Lip assistant can automatically convert acoustic speech to lip parameters with a bit rate of 2.2kbps, and decode them to video-realistic mouth animation on the fly. We use multi-stream HMMs (MSHMMs) and the principal component analysis (PCA) to model the audio-visual speech and the visual articulations, which are learned from AV facial recordings. Speech is converted to lip parameters with natural dynamics by an expectation maximization (EM)-based audio-to-lip converter. The video synthesizer generates video-realistic mouth animations from the encoded lip parameters via PCA expansion. Finally, mouth animation is superimposed on the original video as an assistant for hearing impaired viewers to make a better understanding on the audio-visual contents. Experimental results shows that lip assistant can significantly improve the speech intelligibility of both machines and humans.

源语言	英语
主期刊名	2006 IEEE International Conference on Systems, Man and Cybernetics
出版商	Institute of Electrical and Electronics Engineers Inc.
页	4331-4336
页数	6
ISBN（印刷版）	1424401003, 9781424401000
DOI	https://doi.org/10.1109/ICSMC.2006.384815
出版状态	已出版 - 2006
已对外发布	是
活动	2006 IEEE International Conference on Systems, Man and Cybernetics - Taipei, 中国台湾期限: 8 10月 2006 → 11 10月 2006

出版系列

姓名	Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
卷	5
ISSN（印刷版）	1062-922X

会议

会议	2006 IEEE International Conference on Systems, Man and Cybernetics
国家/地区	中国台湾
市	Taipei
时期	8/10/06 → 11/10/06

访问文件

10.1109/ICSMC.2006.384815

其它文件与链接

链接到 Scopus 的出版物

引用此

Xie, L., Wang, Y., & Liu, Z. Q. (2006). Lip assistant: Visualize speech for hearing impaired people in multimedia services. 在 2006 IEEE International Conference on Systems, Man and Cybernetics (页码 4331-4336). 文章 4274580 (Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics; 卷 5). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICSMC.2006.384815

@inproceedings{b276aa2c31794af591d71b15966c5a09,

title = "Lip assistant: Visualize speech for hearing impaired people in multimedia services",

abstract = "This paper presents a very low bit rate speech-to-video synthesizer, named lip assistant, to help hearing impaired people to better access multimedia services via lipreading. Lip assistant can automatically convert acoustic speech to lip parameters with a bit rate of 2.2kbps, and decode them to video-realistic mouth animation on the fly. We use multi-stream HMMs (MSHMMs) and the principal component analysis (PCA) to model the audio-visual speech and the visual articulations, which are learned from AV facial recordings. Speech is converted to lip parameters with natural dynamics by an expectation maximization (EM)-based audio-to-lip converter. The video synthesizer generates video-realistic mouth animations from the encoded lip parameters via PCA expansion. Finally, mouth animation is superimposed on the original video as an assistant for hearing impaired viewers to make a better understanding on the audio-visual contents. Experimental results shows that lip assistant can significantly improve the speech intelligibility of both machines and humans.",

author = "Lei Xie and Yi Wang and Liu, {Zhi Qiang}",

year = "2006",

doi = "10.1109/ICSMC.2006.384815",

language = "英语",

isbn = "1424401003",

series = "Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "4331--4336",

booktitle = "2006 IEEE International Conference on Systems, Man and Cybernetics",

note = "2006 IEEE International Conference on Systems, Man and Cybernetics ; Conference date: 08-10-2006 Through 11-10-2006",

}

Xie, L, Wang, Y & Liu, ZQ 2006, Lip assistant: Visualize speech for hearing impaired people in multimedia services. 在 2006 IEEE International Conference on Systems, Man and Cybernetics., 4274580, Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, 卷 5, Institute of Electrical and Electronics Engineers Inc., 页码 4331-4336, 2006 IEEE International Conference on Systems, Man and Cybernetics, Taipei, 中国台湾, 8/10/06. https://doi.org/10.1109/ICSMC.2006.384815

Lip assistant: Visualize speech for hearing impaired people in multimedia services. / Xie, Lei; Wang, Yi; Liu, Zhi Qiang.
2006 IEEE International Conference on Systems, Man and Cybernetics. Institute of Electrical and Electronics Engineers Inc., 2006. 页码 4331-4336 4274580 (Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics; 卷 5).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Lip assistant

T2 - 2006 IEEE International Conference on Systems, Man and Cybernetics

AU - Xie, Lei

AU - Wang, Yi

AU - Liu, Zhi Qiang

PY - 2006

Y1 - 2006

N2 - This paper presents a very low bit rate speech-to-video synthesizer, named lip assistant, to help hearing impaired people to better access multimedia services via lipreading. Lip assistant can automatically convert acoustic speech to lip parameters with a bit rate of 2.2kbps, and decode them to video-realistic mouth animation on the fly. We use multi-stream HMMs (MSHMMs) and the principal component analysis (PCA) to model the audio-visual speech and the visual articulations, which are learned from AV facial recordings. Speech is converted to lip parameters with natural dynamics by an expectation maximization (EM)-based audio-to-lip converter. The video synthesizer generates video-realistic mouth animations from the encoded lip parameters via PCA expansion. Finally, mouth animation is superimposed on the original video as an assistant for hearing impaired viewers to make a better understanding on the audio-visual contents. Experimental results shows that lip assistant can significantly improve the speech intelligibility of both machines and humans.

AB - This paper presents a very low bit rate speech-to-video synthesizer, named lip assistant, to help hearing impaired people to better access multimedia services via lipreading. Lip assistant can automatically convert acoustic speech to lip parameters with a bit rate of 2.2kbps, and decode them to video-realistic mouth animation on the fly. We use multi-stream HMMs (MSHMMs) and the principal component analysis (PCA) to model the audio-visual speech and the visual articulations, which are learned from AV facial recordings. Speech is converted to lip parameters with natural dynamics by an expectation maximization (EM)-based audio-to-lip converter. The video synthesizer generates video-realistic mouth animations from the encoded lip parameters via PCA expansion. Finally, mouth animation is superimposed on the original video as an assistant for hearing impaired viewers to make a better understanding on the audio-visual contents. Experimental results shows that lip assistant can significantly improve the speech intelligibility of both machines and humans.

UR - http://www.scopus.com/inward/record.url?scp=34548125702&partnerID=8YFLogxK

U2 - 10.1109/ICSMC.2006.384815

DO - 10.1109/ICSMC.2006.384815

M3 - 会议稿件

AN - SCOPUS:34548125702

SN - 1424401003

SN - 9781424401000

T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics

SP - 4331

EP - 4336

BT - 2006 IEEE International Conference on Systems, Man and Cybernetics

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 8 October 2006 through 11 October 2006

ER -

Xie L, Wang Y, Liu ZQ. Lip assistant: Visualize speech for hearing impaired people in multimedia services. 在 2006 IEEE International Conference on Systems, Man and Cybernetics. Institute of Electrical and Electronics Engineers Inc. 2006. 页码 4331-4336. 4274580. (Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics). doi: 10.1109/ICSMC.2006.384815

Lip assistant: Visualize speech for hearing impaired people in multimedia services

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此