特征提取策略对高分辨率遥感图像场景分类性能影响的评估

Xiaoliang Qian; Jia Li; Gong Cheng; Xiwen Yao; Suna Zhao; Yibin Chen; Liying Jiang

doi:10.11834/jrs.20188015

特征提取策略对高分辨率遥感图像场景分类性能影响的评估

Translated title of the contribution: Evaluation of the effect of feature extraction strategy on the performance of high-resolution remote sensing image scene classification

Xiaoliang Qian, Jia Li, Gong Cheng, Xiwen Yao, Suna Zhao, Yibin Chen, Liying Jiang

School of Automation

Zhengzhou University of Light Industry

Research output: Contribution to journal › Article › peer-review

25 Scopus citations

Abstract

Remote sensing image scene classification aims to tag remote sensing images with semantic categories according to the content of the image and is important in disaster monitoring, environmental detection, and urban planning. Scene classification results can provide valuable information about object recognition and image retrieval and can effectively improve the performance of image interpretation. The general process of remote sensing image scene classification mainly consists of feature extraction and scene classification based on image features. Given that the design of classifiers is relatively mature, this work focuses on feature extraction strategy. The influence of various strategies on the performance of scene classification is short of unified evaluation, which limits its development. The effect of various feature extraction strategies on the performance of high-resolution remote sensing image scene classification is evaluated in this study. In the second section of this paper, existing feature extraction strategies are divided into two categories: (1) hand-designed and (2) data-driven feature extraction. Hand-designed features, such as Color Histograms (CH) and Scale Invariant Feature Transform (SIFT), provide the primary description of images and are presented in the early period. Further abstract description of the images is introduced by coding of hand-designed features, such as Bag of Visual Words (BoVW) and has higher classification accuracy than hand-designed features. However, these feature extraction strategies generally suffer from poor generalization capability due to specific requirements for designing. Furthermore, hand-designed features require significant domain knowledge. By contrast, data-driven features can directly learn powerful features from a large number of sample images and are generally divided into shallow and deep learning features. Shallow learning feature extraction mainly involves Principal Component Analysis (PCA), Independent Component Analysis (ICA), and sparse coding algorithms. Typical deep learning feature extraction strategies include stacked autoencoder (SAE), Deep Belief Network (DBN), and Convolutional Neural Network (CNN). Compared with deep learning models, shallow learning models can be regarded as a neural network with a single hidden layer and thus cannot capture high-level semantic features. The superiority of deep learning features is obvious when dealing with complex scene classification. Furthermore, CNN-based features exhibit improved performance compared with SAE- and DBN-based features because the one-dimensional structure of SAE and DBN destroys the spatial information of images. In the third section of this paper, 29 feature descriptors are quantitatively compared in UC Merced, AID, and NWPU RESISC-45 datasets and eight combinations of feature descriptors are quantitatively compared in the NWPU RESISC-45 dataset. The effect of different feature extraction strategies on the performance of scene classification and the complexity of each dataset are evaluated through quantitative comparison. The experimental results are as follows. (1) The classification accuracy and stability of hand-designed features is poor, however the efficiency of most features is satisfactory and can attain better performance by combining with other types of features. (2) Among all feature extraction strategies, the coding of hand-designed features possesses moderate levels of classification accuracy, efficiency, and stability. (3) The classification accuracy and stability of data-driven features are best, but most of them have low efficiency. (4) AlexNet, a deep learning model with few layers, exhibits the best comprehensive performance and is suitable for occasions that require high classification accuracy, efficiency, and stability. (5) Some scene classes belonging to land use type are easy to be confused because of similar landmark buildings or sites. Moreover, some scene classes belonging to land cover type are easy to be confused because of their similar geomorphologic features. (6) The recently proposed NWPU RESISC-45 dataset is more complex than the other datasets and is more challenging for scene classification algorithms. Finally, the summary and conclusion of this paper are presented, and the discussion of future development is provided. On the one hand, combining prior knowledge introduced by hand-designed features with the CNN model may be one of the future development directions. On the other hand, introducing Generative Adversarial Networks (GAN) into CNN training may be a research hotspot in the future. In addition, remote sensing parameters, such as NDVI and NDWI, and multi-spectral information can be integrated with current feature extraction strategies for practical applications.

Translated title of the contribution	Evaluation of the effect of feature extraction strategy on the performance of high-resolution remote sensing image scene classification
Original language	Chinese (Traditional)
Pages (from-to)	758-776
Number of pages	19
Journal	Yaogan Xuebao/Journal of Remote Sensing
Volume	22
Issue number	5
DOIs	https://doi.org/10.11834/jrs.20188015
State	Published - Sep 2018

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.11834/jrs.20188015

Cite this

@article{85a7a21f983748dd93bdce5ed84dba42,

title = "特征提取策略对高分辨率遥感图像场景分类性能影响的评估",

abstract = "Remote sensing image scene classification aims to tag remote sensing images with semantic categories according to the content of the image and is important in disaster monitoring, environmental detection, and urban planning. Scene classification results can provide valuable information about object recognition and image retrieval and can effectively improve the performance of image interpretation. The general process of remote sensing image scene classification mainly consists of feature extraction and scene classification based on image features. Given that the design of classifiers is relatively mature, this work focuses on feature extraction strategy. The influence of various strategies on the performance of scene classification is short of unified evaluation, which limits its development. The effect of various feature extraction strategies on the performance of high-resolution remote sensing image scene classification is evaluated in this study. In the second section of this paper, existing feature extraction strategies are divided into two categories: (1) hand-designed and (2) data-driven feature extraction. Hand-designed features, such as Color Histograms (CH) and Scale Invariant Feature Transform (SIFT), provide the primary description of images and are presented in the early period. Further abstract description of the images is introduced by coding of hand-designed features, such as Bag of Visual Words (BoVW) and has higher classification accuracy than hand-designed features. However, these feature extraction strategies generally suffer from poor generalization capability due to specific requirements for designing. Furthermore, hand-designed features require significant domain knowledge. By contrast, data-driven features can directly learn powerful features from a large number of sample images and are generally divided into shallow and deep learning features. Shallow learning feature extraction mainly involves Principal Component Analysis (PCA), Independent Component Analysis (ICA), and sparse coding algorithms. Typical deep learning feature extraction strategies include stacked autoencoder (SAE), Deep Belief Network (DBN), and Convolutional Neural Network (CNN). Compared with deep learning models, shallow learning models can be regarded as a neural network with a single hidden layer and thus cannot capture high-level semantic features. The superiority of deep learning features is obvious when dealing with complex scene classification. Furthermore, CNN-based features exhibit improved performance compared with SAE- and DBN-based features because the one-dimensional structure of SAE and DBN destroys the spatial information of images. In the third section of this paper, 29 feature descriptors are quantitatively compared in UC Merced, AID, and NWPU RESISC-45 datasets and eight combinations of feature descriptors are quantitatively compared in the NWPU RESISC-45 dataset. The effect of different feature extraction strategies on the performance of scene classification and the complexity of each dataset are evaluated through quantitative comparison. The experimental results are as follows. (1) The classification accuracy and stability of hand-designed features is poor, however the efficiency of most features is satisfactory and can attain better performance by combining with other types of features. (2) Among all feature extraction strategies, the coding of hand-designed features possesses moderate levels of classification accuracy, efficiency, and stability. (3) The classification accuracy and stability of data-driven features are best, but most of them have low efficiency. (4) AlexNet, a deep learning model with few layers, exhibits the best comprehensive performance and is suitable for occasions that require high classification accuracy, efficiency, and stability. (5) Some scene classes belonging to land use type are easy to be confused because of similar landmark buildings or sites. Moreover, some scene classes belonging to land cover type are easy to be confused because of their similar geomorphologic features. (6) The recently proposed NWPU RESISC-45 dataset is more complex than the other datasets and is more challenging for scene classification algorithms. Finally, the summary and conclusion of this paper are presented, and the discussion of future development is provided. On the one hand, combining prior knowledge introduced by hand-designed features with the CNN model may be one of the future development directions. On the other hand, introducing Generative Adversarial Networks (GAN) into CNN training may be a research hotspot in the future. In addition, remote sensing parameters, such as NDVI and NDWI, and multi-spectral information can be integrated with current feature extraction strategies for practical applications.",

keywords = "Data driven features, Deep learning, Feature extraction strategy, Hand-designed features, High-resolution, Scene classification",

author = "Xiaoliang Qian and Jia Li and Gong Cheng and Xiwen Yao and Suna Zhao and Yibin Chen and Liying Jiang",

year = "2018",

month = sep,

doi = "10.11834/jrs.20188015",

language = "繁体中文",

volume = "22",

pages = "758--776",

journal = "Yaogan Xuebao/Journal of Remote Sensing",

issn = "1007-4619",

publisher = "Science Press ",

number = "5",

}

TY - JOUR

T1 - 特征提取策略对高分辨率遥感图像场景分类性能影响的评估

AU - Qian, Xiaoliang

AU - Li, Jia

AU - Cheng, Gong

AU - Yao, Xiwen

AU - Zhao, Suna

AU - Chen, Yibin

AU - Jiang, Liying

PY - 2018/9

Y1 - 2018/9

N2 - Remote sensing image scene classification aims to tag remote sensing images with semantic categories according to the content of the image and is important in disaster monitoring, environmental detection, and urban planning. Scene classification results can provide valuable information about object recognition and image retrieval and can effectively improve the performance of image interpretation. The general process of remote sensing image scene classification mainly consists of feature extraction and scene classification based on image features. Given that the design of classifiers is relatively mature, this work focuses on feature extraction strategy. The influence of various strategies on the performance of scene classification is short of unified evaluation, which limits its development. The effect of various feature extraction strategies on the performance of high-resolution remote sensing image scene classification is evaluated in this study. In the second section of this paper, existing feature extraction strategies are divided into two categories: (1) hand-designed and (2) data-driven feature extraction. Hand-designed features, such as Color Histograms (CH) and Scale Invariant Feature Transform (SIFT), provide the primary description of images and are presented in the early period. Further abstract description of the images is introduced by coding of hand-designed features, such as Bag of Visual Words (BoVW) and has higher classification accuracy than hand-designed features. However, these feature extraction strategies generally suffer from poor generalization capability due to specific requirements for designing. Furthermore, hand-designed features require significant domain knowledge. By contrast, data-driven features can directly learn powerful features from a large number of sample images and are generally divided into shallow and deep learning features. Shallow learning feature extraction mainly involves Principal Component Analysis (PCA), Independent Component Analysis (ICA), and sparse coding algorithms. Typical deep learning feature extraction strategies include stacked autoencoder (SAE), Deep Belief Network (DBN), and Convolutional Neural Network (CNN). Compared with deep learning models, shallow learning models can be regarded as a neural network with a single hidden layer and thus cannot capture high-level semantic features. The superiority of deep learning features is obvious when dealing with complex scene classification. Furthermore, CNN-based features exhibit improved performance compared with SAE- and DBN-based features because the one-dimensional structure of SAE and DBN destroys the spatial information of images. In the third section of this paper, 29 feature descriptors are quantitatively compared in UC Merced, AID, and NWPU RESISC-45 datasets and eight combinations of feature descriptors are quantitatively compared in the NWPU RESISC-45 dataset. The effect of different feature extraction strategies on the performance of scene classification and the complexity of each dataset are evaluated through quantitative comparison. The experimental results are as follows. (1) The classification accuracy and stability of hand-designed features is poor, however the efficiency of most features is satisfactory and can attain better performance by combining with other types of features. (2) Among all feature extraction strategies, the coding of hand-designed features possesses moderate levels of classification accuracy, efficiency, and stability. (3) The classification accuracy and stability of data-driven features are best, but most of them have low efficiency. (4) AlexNet, a deep learning model with few layers, exhibits the best comprehensive performance and is suitable for occasions that require high classification accuracy, efficiency, and stability. (5) Some scene classes belonging to land use type are easy to be confused because of similar landmark buildings or sites. Moreover, some scene classes belonging to land cover type are easy to be confused because of their similar geomorphologic features. (6) The recently proposed NWPU RESISC-45 dataset is more complex than the other datasets and is more challenging for scene classification algorithms. Finally, the summary and conclusion of this paper are presented, and the discussion of future development is provided. On the one hand, combining prior knowledge introduced by hand-designed features with the CNN model may be one of the future development directions. On the other hand, introducing Generative Adversarial Networks (GAN) into CNN training may be a research hotspot in the future. In addition, remote sensing parameters, such as NDVI and NDWI, and multi-spectral information can be integrated with current feature extraction strategies for practical applications.

AB - Remote sensing image scene classification aims to tag remote sensing images with semantic categories according to the content of the image and is important in disaster monitoring, environmental detection, and urban planning. Scene classification results can provide valuable information about object recognition and image retrieval and can effectively improve the performance of image interpretation. The general process of remote sensing image scene classification mainly consists of feature extraction and scene classification based on image features. Given that the design of classifiers is relatively mature, this work focuses on feature extraction strategy. The influence of various strategies on the performance of scene classification is short of unified evaluation, which limits its development. The effect of various feature extraction strategies on the performance of high-resolution remote sensing image scene classification is evaluated in this study. In the second section of this paper, existing feature extraction strategies are divided into two categories: (1) hand-designed and (2) data-driven feature extraction. Hand-designed features, such as Color Histograms (CH) and Scale Invariant Feature Transform (SIFT), provide the primary description of images and are presented in the early period. Further abstract description of the images is introduced by coding of hand-designed features, such as Bag of Visual Words (BoVW) and has higher classification accuracy than hand-designed features. However, these feature extraction strategies generally suffer from poor generalization capability due to specific requirements for designing. Furthermore, hand-designed features require significant domain knowledge. By contrast, data-driven features can directly learn powerful features from a large number of sample images and are generally divided into shallow and deep learning features. Shallow learning feature extraction mainly involves Principal Component Analysis (PCA), Independent Component Analysis (ICA), and sparse coding algorithms. Typical deep learning feature extraction strategies include stacked autoencoder (SAE), Deep Belief Network (DBN), and Convolutional Neural Network (CNN). Compared with deep learning models, shallow learning models can be regarded as a neural network with a single hidden layer and thus cannot capture high-level semantic features. The superiority of deep learning features is obvious when dealing with complex scene classification. Furthermore, CNN-based features exhibit improved performance compared with SAE- and DBN-based features because the one-dimensional structure of SAE and DBN destroys the spatial information of images. In the third section of this paper, 29 feature descriptors are quantitatively compared in UC Merced, AID, and NWPU RESISC-45 datasets and eight combinations of feature descriptors are quantitatively compared in the NWPU RESISC-45 dataset. The effect of different feature extraction strategies on the performance of scene classification and the complexity of each dataset are evaluated through quantitative comparison. The experimental results are as follows. (1) The classification accuracy and stability of hand-designed features is poor, however the efficiency of most features is satisfactory and can attain better performance by combining with other types of features. (2) Among all feature extraction strategies, the coding of hand-designed features possesses moderate levels of classification accuracy, efficiency, and stability. (3) The classification accuracy and stability of data-driven features are best, but most of them have low efficiency. (4) AlexNet, a deep learning model with few layers, exhibits the best comprehensive performance and is suitable for occasions that require high classification accuracy, efficiency, and stability. (5) Some scene classes belonging to land use type are easy to be confused because of similar landmark buildings or sites. Moreover, some scene classes belonging to land cover type are easy to be confused because of their similar geomorphologic features. (6) The recently proposed NWPU RESISC-45 dataset is more complex than the other datasets and is more challenging for scene classification algorithms. Finally, the summary and conclusion of this paper are presented, and the discussion of future development is provided. On the one hand, combining prior knowledge introduced by hand-designed features with the CNN model may be one of the future development directions. On the other hand, introducing Generative Adversarial Networks (GAN) into CNN training may be a research hotspot in the future. In addition, remote sensing parameters, such as NDVI and NDWI, and multi-spectral information can be integrated with current feature extraction strategies for practical applications.

KW - Data driven features

KW - Deep learning

KW - Feature extraction strategy

KW - Hand-designed features

KW - High-resolution

KW - Scene classification

UR - http://www.scopus.com/inward/record.url?scp=85054837402&partnerID=8YFLogxK

U2 - 10.11834/jrs.20188015

DO - 10.11834/jrs.20188015

M3 - 文章

AN - SCOPUS:85054837402

SN - 1007-4619

VL - 22

SP - 758

EP - 776

JO - Yaogan Xuebao/Journal of Remote Sensing

JF - Yaogan Xuebao/Journal of Remote Sensing

IS - 5

ER -

特征提取策略对高分辨率遥感图像场景分类性能影响的评估

Abstract

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this