Speech pattern discovery using audio-visual fusion and canonical correlation analysis

Lei Xie, Yinqing Xu, Lilei Zheng, Qiang Huang, Bingfeng Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

In this paper, we address the problem of automatic discovery of speech patterns using audio-visual information fusion. Unlike those previous studies based on single audio modality, our work not only uses the acoustic information, but also takes into account the visual features extracted from the mouth region. To improve the effectiveness of the use of multimodal information, several audio-visual fusion strategies, including feature concatenation, similarity weighting and decision fusion, are utilized. Specifically, our decision fusion approach retains the reliable patterns discovered in the audio and visual modalities. Moreover, we use canonical correlation analysis (CCA) to address the issue of temporal asynchrony between audio and visual speech modalities and unbounded dynamic time warping (UDTW) is adopted to search for the speech patterns through audio and visual similarity matrices calculated on the aligned audio and visual sequence. Experiments on an audio-visual corpus show that, for the first time, speech pattern discovery can be improved by the use of visual information. The decision fusion approach shows superior performance compared with standard feature concatenation and similarity weighting. CCA-based audio-visual synchronization plays an important role in the performance improvement.

源语言英语
主期刊名13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
2371-2374
页数4
出版状态已出版 - 2012
活动13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, 美国
期限: 9 9月 201213 9月 2012

出版系列

姓名13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
3

会议

会议13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
国家/地区美国
Portland, OR
时期9/09/1213/09/12

指纹

探究 'Speech pattern discovery using audio-visual fusion and canonical correlation analysis' 的科研主题。它们共同构成独一无二的指纹。

引用此