TY - JOUR
T1 - Bridging CNN and Transformer with Cross-Attention Fusion Network for Hyperspectral Image Classification
AU - Xu, Fulin
AU - Mei, Shaohui
AU - Zhang, Ge
AU - Wang, Nan
AU - Du, Qian
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Feature representation is crucial for hyperspectral image (HSI) classification. However, existing convolutional neural network (CNN)-based methods are limited by the convolution kernel and only focus on local features, which causes it to ignore the global properties of HSIs. Transformer-based networks can make up for the limitations of CNNs because they emphasize the global features of HSIs. How to combine the advantages of these two networks in feature extraction is of great importance in improving classification accuracy. Therefore, a cross-attention fusion network bridging CNN and Transformer (CAF-Former) is proposed, which can fully utilize the advantages of CNN in local features and Transformer's long time-dependent feature learning for hyperspectral classification. In order to fully explore the local and global information within an HSI, a Dynamic-CNN branch is proposed to effectively encode local features of pixels, while a Gaussian Transformer branch is constructed to accurately model the global features and long-range dependencies. Moreover, in order to fully interact with local and global features, a cross-attention fusion (CAF) module is proposed as a bridge to fuse the features extracted by the two branches. Experiments over several benchmark datasets demonstrate that the proposed CAF-Former significantly outperforms both CNN-based and Transformer-based state-of-the-art networks for HSI classification.
AB - Feature representation is crucial for hyperspectral image (HSI) classification. However, existing convolutional neural network (CNN)-based methods are limited by the convolution kernel and only focus on local features, which causes it to ignore the global properties of HSIs. Transformer-based networks can make up for the limitations of CNNs because they emphasize the global features of HSIs. How to combine the advantages of these two networks in feature extraction is of great importance in improving classification accuracy. Therefore, a cross-attention fusion network bridging CNN and Transformer (CAF-Former) is proposed, which can fully utilize the advantages of CNN in local features and Transformer's long time-dependent feature learning for hyperspectral classification. In order to fully explore the local and global information within an HSI, a Dynamic-CNN branch is proposed to effectively encode local features of pixels, while a Gaussian Transformer branch is constructed to accurately model the global features and long-range dependencies. Moreover, in order to fully interact with local and global features, a cross-attention fusion (CAF) module is proposed as a bridge to fuse the features extracted by the two branches. Experiments over several benchmark datasets demonstrate that the proposed CAF-Former significantly outperforms both CNN-based and Transformer-based state-of-the-art networks for HSI classification.
KW - Convolutional neural network (CNN)
KW - feature fusion
KW - hyperspectral image (HSI) classification
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85197077372&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2024.3419266
DO - 10.1109/TGRS.2024.3419266
M3 - 文章
AN - SCOPUS:85197077372
SN - 0196-2892
VL - 62
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5522214
ER -