TY - JOUR
T1 - Looking Closer at the Scene
T2 - Multiscale Representation Learning for Remote Sensing Image Scene Classification
AU - Wang, Qi
AU - Huang, Wei
AU - Xiong, Zhitong
AU - Li, Xuelong
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2022/4/1
Y1 - 2022/4/1
N2 - Remote sensing image scene classification has attracted great attention because of its wide applications. Although convolutional neural network (CNN)-based methods for scene classification have achieved excellent results, the large-scale variation of the features and objects in remote sensing images limits the further improvement of the classification performance. To address this issue, we present multiscale representation for scene classification, which is realized by a global-local two-stream architecture. This architecture has two branches of the global stream and local stream, which can individually extract the global features and local features from the whole image and the most important area. In order to locate the most important area in the whole image using only image-level labels, a weakly supervised key area detection strategy of structured key area localization (SKAL) is specially designed to connect the above two streams. To verify the effectiveness of the proposed SKAL-based two-stream architecture, we conduct comparative experiments based on three widely used CNN models, including AlexNet, GoogleNet, and ResNet18, on four public remote sensing image scene classification data sets, and achieve the state-of-the-art results on all the four data sets. Our codes are provided in https://github.com/hw2hwei/SKAL.
AB - Remote sensing image scene classification has attracted great attention because of its wide applications. Although convolutional neural network (CNN)-based methods for scene classification have achieved excellent results, the large-scale variation of the features and objects in remote sensing images limits the further improvement of the classification performance. To address this issue, we present multiscale representation for scene classification, which is realized by a global-local two-stream architecture. This architecture has two branches of the global stream and local stream, which can individually extract the global features and local features from the whole image and the most important area. In order to locate the most important area in the whole image using only image-level labels, a weakly supervised key area detection strategy of structured key area localization (SKAL) is specially designed to connect the above two streams. To verify the effectiveness of the proposed SKAL-based two-stream architecture, we conduct comparative experiments based on three widely used CNN models, including AlexNet, GoogleNet, and ResNet18, on four public remote sensing image scene classification data sets, and achieve the state-of-the-art results on all the four data sets. Our codes are provided in https://github.com/hw2hwei/SKAL.
KW - Convolutional neural network (CNN)
KW - Multiscale representation
KW - Remote sensing
KW - Scene classification
KW - Structured key area localization (SKAL)
UR - http://www.scopus.com/inward/record.url?scp=85098776900&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2020.3042276
DO - 10.1109/TNNLS.2020.3042276
M3 - 文章
C2 - 33332278
AN - SCOPUS:85098776900
SN - 2162-237X
VL - 33
SP - 1414
EP - 1428
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 4
ER -