Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition

Lei Xie, Zhi Qiang Liu

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

We propose a multi-stream articulator model (MSAM) for audio visual speech recognition (AVSR). This model extends the articulator modelling technique recently used in audio-only speech recognition to audio-visual domain. A multiple-stream structure with a shared articulator layer is used in the model to mimic the speech production process. We also present an adaptive reliability measure (ARM) based on two local dispersion indicators, integrating audio and visual streams with local, temporal reliability. Experiments on the AVCONDIG database shows that our model can achieve comparable recognition performance with the multi-stream hidden Markov model (MSHMM) under various noisy conditions. With the help of the ARM, our model even performs the best at some testing SNRs.

源语言英语
主期刊名Advances in Machine Learning and Cybernetics - 4th International Conference, ICMLC 2005, Revised Selected Papers
994-1004
页数11
DOI
出版状态已出版 - 2006
已对外发布
活动4th International Conference on Machine Learning and Cybernetics, ICMLC 2005 - Guangzhou, 中国
期限: 18 8月 200521 8月 2005

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
3930 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议4th International Conference on Machine Learning and Cybernetics, ICMLC 2005
国家/地区中国
Guangzhou
时期18/08/0521/08/05

指纹

探究 'Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此