Adaptive stream reliability modeling based on local dispersion measures for audio visual speech recognition

Lei Xie; Rong Chun Zhao; Zhi Qiang Liu

Adaptive stream reliability modeling based on local dispersion measures for audio visual speech recognition

Lei Xie, Rong Chun Zhao, Zhi Qiang Liu

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

This paper proposes an adaptive stream reliability modeling technique for audio visual speech recognition (AVSR). As recognition conditions vary locally, we present two local measures - frame and window dispersions to depict the temporal discriminative powers and noise levels of both audio and visual streams. The dispersions are subsequently mapped to stream exponents according to the minimum classification error (MCE) criterion. Experiments on a connected-digits task show that our method consistently outperforms the popular Discriminative Training (DT) and Grid Search (GS) methods at various signal noise ratios (SNRs), improving for example word accuracy rate (WAR) from 94.7% to 96.4% at 28dB SNR.

源语言	英语
主期刊名	2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005
页	4852-4857
页数	6
出版状态	已出版 - 2005
活动	International Conference on Machine Learning and Cybernetics, ICMLC 2005 - Guangzhou, 中国期限: 18 8月 2005 → 21 8月 2005

出版系列

姓名	2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005

会议

会议	International Conference on Machine Learning and Cybernetics, ICMLC 2005
国家/地区	中国
市	Guangzhou
时期	18/08/05 → 21/08/05

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{e9b709b61adb48cbab1138fe2e40bc39,

title = "Adaptive stream reliability modeling based on local dispersion measures for audio visual speech recognition",

abstract = "This paper proposes an adaptive stream reliability modeling technique for audio visual speech recognition (AVSR). As recognition conditions vary locally, we present two local measures - frame and window dispersions to depict the temporal discriminative powers and noise levels of both audio and visual streams. The dispersions are subsequently mapped to stream exponents according to the minimum classification error (MCE) criterion. Experiments on a connected-digits task show that our method consistently outperforms the popular Discriminative Training (DT) and Grid Search (GS) methods at various signal noise ratios (SNRs), improving for example word accuracy rate (WAR) from 94.7% to 96.4% at 28dB SNR.",

keywords = "Audio visual speech recognition, Dispersion, Lipreading, MCE-GPD, Stream exponents",

author = "Lei Xie and Zhao, {Rong Chun} and Liu, {Zhi Qiang}",

year = "2005",

language = "英语",

isbn = "078039092X",

series = "2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005",

pages = "4852--4857",

booktitle = "2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005",

note = "International Conference on Machine Learning and Cybernetics, ICMLC 2005 ; Conference date: 18-08-2005 Through 21-08-2005",

}

Xie, L, Zhao, RC & Liu, ZQ 2005, Adaptive stream reliability modeling based on local dispersion measures for audio visual speech recognition. 在 2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005. 2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005, 页码 4852-4857, International Conference on Machine Learning and Cybernetics, ICMLC 2005, Guangzhou, 中国, 18/08/05.

Adaptive stream reliability modeling based on local dispersion measures for audio visual speech recognition. / Xie, Lei; Zhao, Rong Chun; Liu, Zhi Qiang.
2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005. 2005. 页码 4852-4857 (2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Adaptive stream reliability modeling based on local dispersion measures for audio visual speech recognition

AU - Xie, Lei

AU - Zhao, Rong Chun

AU - Liu, Zhi Qiang

PY - 2005

Y1 - 2005

N2 - This paper proposes an adaptive stream reliability modeling technique for audio visual speech recognition (AVSR). As recognition conditions vary locally, we present two local measures - frame and window dispersions to depict the temporal discriminative powers and noise levels of both audio and visual streams. The dispersions are subsequently mapped to stream exponents according to the minimum classification error (MCE) criterion. Experiments on a connected-digits task show that our method consistently outperforms the popular Discriminative Training (DT) and Grid Search (GS) methods at various signal noise ratios (SNRs), improving for example word accuracy rate (WAR) from 94.7% to 96.4% at 28dB SNR.

AB - This paper proposes an adaptive stream reliability modeling technique for audio visual speech recognition (AVSR). As recognition conditions vary locally, we present two local measures - frame and window dispersions to depict the temporal discriminative powers and noise levels of both audio and visual streams. The dispersions are subsequently mapped to stream exponents according to the minimum classification error (MCE) criterion. Experiments on a connected-digits task show that our method consistently outperforms the popular Discriminative Training (DT) and Grid Search (GS) methods at various signal noise ratios (SNRs), improving for example word accuracy rate (WAR) from 94.7% to 96.4% at 28dB SNR.

KW - Audio visual speech recognition

KW - Dispersion

KW - Lipreading

KW - MCE-GPD

KW - Stream exponents

UR - http://www.scopus.com/inward/record.url?scp=28444438538&partnerID=8YFLogxK

M3 - 会议稿件

AN - SCOPUS:28444438538

SN - 078039092X

SN - 9780780390928

T3 - 2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005

SP - 4852

EP - 4857

BT - 2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005

T2 - International Conference on Machine Learning and Cybernetics, ICMLC 2005

Y2 - 18 August 2005 through 21 August 2005

ER -

Adaptive stream reliability modeling based on local dispersion measures for audio visual speech recognition

摘要

出版系列

会议

其它文件与链接

指纹

引用此