Dense Multimodal Fusion for Hierarchically Joint Representation

Di Hu, Chengze Wang, Feiping Nie, Xuelong Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

38 引用 (Scopus)

摘要

Multiple modalities can provide more valuable information than single one by describing the same contents in various ways. Previous methods mainly focus on fusing the shallow features or high-level representations generated by unimodal deep networks, which only capture part of the hierarchical correlations across modalities. In this paper, we propose to densely integrate the representations by greedily stacking multiple shared layers between different modality-specific networks, which is named as Dense Multimodal Fusion (DMF). The joint representations in different shared layers can capture the correlations in different levels, and the connection between shared layers also provides an efficient way to learn the dependence among hierarchical correlations. These two properties jointly contribute to the multiple learning paths in DMF, which results in faster convergence, lower training loss, and better performance. We evaluate our model on audiovisual speech recognition and cross-modal retrieval. The noticeable performance demonstrates that our model can learn more effective joint representation.

源语言英语
主期刊名2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
3941-3945
页数5
ISBN(电子版)9781479981311
DOI
出版状态已出版 - 5月 2019
活动44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, 英国
期限: 12 5月 201917 5月 2019

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2019-May
ISSN(印刷版)1520-6149

会议

会议44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
国家/地区英国
Brighton
时期12/05/1917/05/19

指纹

探究 'Dense Multimodal Fusion for Hierarchically Joint Representation' 的科研主题。它们共同构成独一无二的指纹。

引用此