TY - JOUR
T1 - Hierarchical Feature Fusion of Transformer with Patch Dilating for Remote Sensing Scene Classification
AU - Chen, Xiaoning
AU - Ma, Mingyang
AU - Li, Yong
AU - Mei, Shaohui
AU - Han, Zonghao
AU - Zhao, Jian
AU - Cheng, Wei
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - Recently, the Transformer-based technique has emerged as a promising solution for modeling contextual information in remote sensing (RS) scenes and has found widespread applications in RS scene classification. However, how to make full use of intermediate features learned in Transformers is of crucial importance in the RS scene classification tasks. Therefore, this article proposes a hierarchical feature fusion of transformer with patch dilating (HFFT-PD), which aims to capture rich contextual information from hierarchical features to enhance the performance of RS scene classification. Specifically, the HFFT-PD model consists of a hierarchical transformer merging (HTM) block and a lightweight adaptive channel compression (LACC) module, in which the HTM is specially designed for the Transformer architecture to bridge the semantic gaps between features from different hierarchical blocks, and the LACC accounts for the significance of distinct channels in the ultimate classification features. In addition, a brand-new Patch Dilating strategy is uniquely designed for the Transformer paradigm, functioning as a reassembly operator predicated on patch features. Contrasting with conventional upsampling techniques, Patch Dilating facilitates upsampling without requiring supplementary information, while concurrently preserving the semantic content of local spatial structure. Extensive and rigorous experiments conducted on the UC Merced land-use dataset (UCM), aerial image dataset (AID), and NWPU-45 datasets, with training ratios of 80%, 50%, and 20%, respectively, demonstrate that our proposed HFFT-PD outperforms the baseline at least by 0.59%, 0.44%, and 0.99%, respectively, showcasing the significant superiority of our HFFT-PD over contemporary state-of-the-art methodologies.
AB - Recently, the Transformer-based technique has emerged as a promising solution for modeling contextual information in remote sensing (RS) scenes and has found widespread applications in RS scene classification. However, how to make full use of intermediate features learned in Transformers is of crucial importance in the RS scene classification tasks. Therefore, this article proposes a hierarchical feature fusion of transformer with patch dilating (HFFT-PD), which aims to capture rich contextual information from hierarchical features to enhance the performance of RS scene classification. Specifically, the HFFT-PD model consists of a hierarchical transformer merging (HTM) block and a lightweight adaptive channel compression (LACC) module, in which the HTM is specially designed for the Transformer architecture to bridge the semantic gaps between features from different hierarchical blocks, and the LACC accounts for the significance of distinct channels in the ultimate classification features. In addition, a brand-new Patch Dilating strategy is uniquely designed for the Transformer paradigm, functioning as a reassembly operator predicated on patch features. Contrasting with conventional upsampling techniques, Patch Dilating facilitates upsampling without requiring supplementary information, while concurrently preserving the semantic content of local spatial structure. Extensive and rigorous experiments conducted on the UC Merced land-use dataset (UCM), aerial image dataset (AID), and NWPU-45 datasets, with training ratios of 80%, 50%, and 20%, respectively, demonstrate that our proposed HFFT-PD outperforms the baseline at least by 0.59%, 0.44%, and 0.99%, respectively, showcasing the significant superiority of our HFFT-PD over contemporary state-of-the-art methodologies.
KW - Feature fusion
KW - remote sensing (RS)
KW - scene classification
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85177087829&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2023.3331880
DO - 10.1109/TGRS.2023.3331880
M3 - 文章
AN - SCOPUS:85177087829
SN - 0196-2892
VL - 61
SP - 1
EP - 16
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 4410516
ER -