Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition

Lei Xie; Zhi Qiang Liu

doi:10.1007/11739685_104

Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition

Lei Xie, Zhi Qiang Liu

City University of Hong Kong

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

4 引用（Scopus）

摘要

We propose a multi-stream articulator model (MSAM) for audio visual speech recognition (AVSR). This model extends the articulator modelling technique recently used in audio-only speech recognition to audio-visual domain. A multiple-stream structure with a shared articulator layer is used in the model to mimic the speech production process. We also present an adaptive reliability measure (ARM) based on two local dispersion indicators, integrating audio and visual streams with local, temporal reliability. Experiments on the AVCONDIG database shows that our model can achieve comparable recognition performance with the multi-stream hidden Markov model (MSHMM) under various noisy conditions. With the help of the ARM, our model even performs the best at some testing SNRs.

源语言	英语
主期刊名	Advances in Machine Learning and Cybernetics - 4th International Conference, ICMLC 2005, Revised Selected Papers
页	994-1004
页数	11
DOI	https://doi.org/10.1007/11739685_104
出版状态	已出版 - 2006
已对外发布	是
活动	4th International Conference on Machine Learning and Cybernetics, ICMLC 2005 - Guangzhou, 中国期限: 18 8月 2005 → 21 8月 2005

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	3930 LNAI
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	4th International Conference on Machine Learning and Cybernetics, ICMLC 2005
国家/地区	中国
市	Guangzhou
时期	18/08/05 → 21/08/05

访问文件

10.1007/11739685_104

其它文件与链接

链接到 Scopus 的出版物

引用此

Xie, L., & Liu, Z. Q. (2006). Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition. 在 Advances in Machine Learning and Cybernetics - 4th International Conference, ICMLC 2005, Revised Selected Papers (页码 994-1004). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 3930 LNAI). https://doi.org/10.1007/11739685_104

Xie, Lei ; Liu, Zhi Qiang. / Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition. Advances in Machine Learning and Cybernetics - 4th International Conference, ICMLC 2005, Revised Selected Papers. 2006. 页码 994-1004 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{e366a53677474f4992150317128760ce,

title = "Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition",

abstract = "We propose a multi-stream articulator model (MSAM) for audio visual speech recognition (AVSR). This model extends the articulator modelling technique recently used in audio-only speech recognition to audio-visual domain. A multiple-stream structure with a shared articulator layer is used in the model to mimic the speech production process. We also present an adaptive reliability measure (ARM) based on two local dispersion indicators, integrating audio and visual streams with local, temporal reliability. Experiments on the AVCONDIG database shows that our model can achieve comparable recognition performance with the multi-stream hidden Markov model (MSHMM) under various noisy conditions. With the help of the ARM, our model even performs the best at some testing SNRs.",

author = "Lei Xie and Liu, {Zhi Qiang}",

year = "2006",

doi = "10.1007/11739685_104",

language = "英语",

isbn = "3540335846",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "994--1004",

booktitle = "Advances in Machine Learning and Cybernetics - 4th International Conference, ICMLC 2005, Revised Selected Papers",

note = "4th International Conference on Machine Learning and Cybernetics, ICMLC 2005 ; Conference date: 18-08-2005 Through 21-08-2005",

}

Xie, L & Liu, ZQ 2006, Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition. 在 Advances in Machine Learning and Cybernetics - 4th International Conference, ICMLC 2005, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 3930 LNAI, 页码 994-1004, 4th International Conference on Machine Learning and Cybernetics, ICMLC 2005, Guangzhou, 中国, 18/08/05. https://doi.org/10.1007/11739685_104

Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition. / Xie, Lei; Liu, Zhi Qiang.
Advances in Machine Learning and Cybernetics - 4th International Conference, ICMLC 2005, Revised Selected Papers. 2006. 页码 994-1004 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 3930 LNAI).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition

AU - Xie, Lei

AU - Liu, Zhi Qiang

PY - 2006

Y1 - 2006

N2 - We propose a multi-stream articulator model (MSAM) for audio visual speech recognition (AVSR). This model extends the articulator modelling technique recently used in audio-only speech recognition to audio-visual domain. A multiple-stream structure with a shared articulator layer is used in the model to mimic the speech production process. We also present an adaptive reliability measure (ARM) based on two local dispersion indicators, integrating audio and visual streams with local, temporal reliability. Experiments on the AVCONDIG database shows that our model can achieve comparable recognition performance with the multi-stream hidden Markov model (MSHMM) under various noisy conditions. With the help of the ARM, our model even performs the best at some testing SNRs.

AB - We propose a multi-stream articulator model (MSAM) for audio visual speech recognition (AVSR). This model extends the articulator modelling technique recently used in audio-only speech recognition to audio-visual domain. A multiple-stream structure with a shared articulator layer is used in the model to mimic the speech production process. We also present an adaptive reliability measure (ARM) based on two local dispersion indicators, integrating audio and visual streams with local, temporal reliability. Experiments on the AVCONDIG database shows that our model can achieve comparable recognition performance with the multi-stream hidden Markov model (MSHMM) under various noisy conditions. With the help of the ARM, our model even performs the best at some testing SNRs.

UR - http://www.scopus.com/inward/record.url?scp=33745797487&partnerID=8YFLogxK

U2 - 10.1007/11739685_104

DO - 10.1007/11739685_104

M3 - 会议稿件

AN - SCOPUS:33745797487

SN - 3540335846

SN - 9783540335849

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 994

EP - 1004

BT - Advances in Machine Learning and Cybernetics - 4th International Conference, ICMLC 2005, Revised Selected Papers

T2 - 4th International Conference on Machine Learning and Cybernetics, ICMLC 2005

Y2 - 18 August 2005 through 21 August 2005

ER -

Xie L, Liu ZQ. Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition. 在 Advances in Machine Learning and Cybernetics - 4th International Conference, ICMLC 2005, Revised Selected Papers. 2006. 页码 994-1004. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/11739685_104

Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此