MSN: Modality separation networks for RGB-D scene recognition

Zhitong Xiong, Yuan Yuan, Qi Wang

科研成果: 期刊稿件文章同行评审

23 引用 (Scopus)

摘要

RGB-D image based indoor scene recognition is a challenging task due to the complex scene layouts and cluttered objects. Although the depth modality can provide extra geometric information, how to better learn the multi-modal features is still an open problem. Considering this, in this paper we propose the modality separation networks to extract the modal-consistent and modal-specific features simultaneously. The motivations of this work are from two aspects: 1) The first one is to learn what is unique to each modality and what is common between the two modalities explicitly; 2) The second one is to explore the relationship between global/local features and modal-specific/consistent features. To this end, the proposed framework contains two branches of submodules to learn the multi-modal features. One branch is used to extract the individual characteristics of each modality by minimizing the similarity between two modalities. Another branch is to learn the common information between two modalities by maximizing the correlation term. Moreover, with the spatial attention module, our method can visualize the spatial positions where different submodules focus on. We evaluate our method on two public RGB-D scene recognition datasets, and new state-of-the-art results are achieved with the proposed framework.

源语言英语
页(从-至)81-89
页数9
期刊Neurocomputing
373
DOI
出版状态已出版 - 15 1月 2020

指纹

探究 'MSN: Modality separation networks for RGB-D scene recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此