ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition

Zhitong Xiong; Yuan Yuan; Qi Wang

doi:10.1109/TIP.2021.3053459

ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition

Zhitong Xiong, Yuan Yuan, Qi Wang

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

40 Scopus citations

Abstract

Indoor scene images usually contain scattered objects and various scene layouts, which make RGB-D scene classification a challenging task. Existing methods still have limitations for classifying scene images with great spatial variability. Thus, how to extract local patch-level features effectively using only image label is still an open problem for RGB-D scene recognition. In this article, we propose an efficient framework for RGB-D scene recognition, which adaptively selects important local features to capture the great spatial variability of scene images. Specifically, we design a differentiable local feature selection (DLFS) module, which can extract the appropriate number of key local scene-related features. Discriminative local theme-level and object-level representations can be selected with DLFS module from the spatially-correlated multi-modal RGB-D features. We take advantage of the correlation between RGB and depth modalities to provide more cues for selecting local features. To ensure that discriminative local features are selected, the variational mutual information maximization loss is proposed. Additionally, the DLFS module can be easily extended to select local features of different scales. By concatenating the local-orderless and global-structured multi-modal features, the proposed framework can achieve state-of-the-art performance on public RGB-D scene recognition datasets.

Original language	English
Article number	9337174
Pages (from-to)	2722-2733
Number of pages	12
Journal	IEEE Transactions on Image Processing
Volume	30
DOIs	https://doi.org/10.1109/TIP.2021.3053459
State	Published - 2021

Keywords

local feature selection
multi-modal feature learning
RGB-D recognition

Access to Document

10.1109/TIP.2021.3053459

Cite this

@article{9a89e21c776e41cf82a809c0a41dbfab,

title = "ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition",

abstract = "Indoor scene images usually contain scattered objects and various scene layouts, which make RGB-D scene classification a challenging task. Existing methods still have limitations for classifying scene images with great spatial variability. Thus, how to extract local patch-level features effectively using only image label is still an open problem for RGB-D scene recognition. In this article, we propose an efficient framework for RGB-D scene recognition, which adaptively selects important local features to capture the great spatial variability of scene images. Specifically, we design a differentiable local feature selection (DLFS) module, which can extract the appropriate number of key local scene-related features. Discriminative local theme-level and object-level representations can be selected with DLFS module from the spatially-correlated multi-modal RGB-D features. We take advantage of the correlation between RGB and depth modalities to provide more cues for selecting local features. To ensure that discriminative local features are selected, the variational mutual information maximization loss is proposed. Additionally, the DLFS module can be easily extended to select local features of different scales. By concatenating the local-orderless and global-structured multi-modal features, the proposed framework can achieve state-of-the-art performance on public RGB-D scene recognition datasets.",

keywords = "local feature selection, multi-modal feature learning, RGB-D recognition",

author = "Zhitong Xiong and Yuan Yuan and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE.",

year = "2021",

doi = "10.1109/TIP.2021.3053459",

language = "英语",

volume = "30",

pages = "2722--2733",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - ASK

T2 - Adaptively Selecting Key Local Features for RGB-D Scene Recognition

AU - Xiong, Zhitong

AU - Yuan, Yuan

AU - Wang, Qi

PY - 2021

Y1 - 2021

N2 - Indoor scene images usually contain scattered objects and various scene layouts, which make RGB-D scene classification a challenging task. Existing methods still have limitations for classifying scene images with great spatial variability. Thus, how to extract local patch-level features effectively using only image label is still an open problem for RGB-D scene recognition. In this article, we propose an efficient framework for RGB-D scene recognition, which adaptively selects important local features to capture the great spatial variability of scene images. Specifically, we design a differentiable local feature selection (DLFS) module, which can extract the appropriate number of key local scene-related features. Discriminative local theme-level and object-level representations can be selected with DLFS module from the spatially-correlated multi-modal RGB-D features. We take advantage of the correlation between RGB and depth modalities to provide more cues for selecting local features. To ensure that discriminative local features are selected, the variational mutual information maximization loss is proposed. Additionally, the DLFS module can be easily extended to select local features of different scales. By concatenating the local-orderless and global-structured multi-modal features, the proposed framework can achieve state-of-the-art performance on public RGB-D scene recognition datasets.

AB - Indoor scene images usually contain scattered objects and various scene layouts, which make RGB-D scene classification a challenging task. Existing methods still have limitations for classifying scene images with great spatial variability. Thus, how to extract local patch-level features effectively using only image label is still an open problem for RGB-D scene recognition. In this article, we propose an efficient framework for RGB-D scene recognition, which adaptively selects important local features to capture the great spatial variability of scene images. Specifically, we design a differentiable local feature selection (DLFS) module, which can extract the appropriate number of key local scene-related features. Discriminative local theme-level and object-level representations can be selected with DLFS module from the spatially-correlated multi-modal RGB-D features. We take advantage of the correlation between RGB and depth modalities to provide more cues for selecting local features. To ensure that discriminative local features are selected, the variational mutual information maximization loss is proposed. Additionally, the DLFS module can be easily extended to select local features of different scales. By concatenating the local-orderless and global-structured multi-modal features, the proposed framework can achieve state-of-the-art performance on public RGB-D scene recognition datasets.

KW - local feature selection

KW - multi-modal feature learning

KW - RGB-D recognition

UR - http://www.scopus.com/inward/record.url?scp=85100475217&partnerID=8YFLogxK

U2 - 10.1109/TIP.2021.3053459

DO - 10.1109/TIP.2021.3053459

M3 - 文章

C2 - 33502980

AN - SCOPUS:85100475217

SN - 1057-7149

VL - 30

SP - 2722

EP - 2733

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

M1 - 9337174

ER -

ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this