Bridging CNN and Transformer with Cross-Attention Fusion Network for Hyperspectral Image Classification

Fulin Xu; Shaohui Mei; Ge Zhang; Nan Wang; Qian Du

doi:10.1109/TGRS.2024.3419266

Bridging CNN and Transformer with Cross-Attention Fusion Network for Hyperspectral Image Classification

Fulin Xu, Shaohui Mei, Ge Zhang, Nan Wang, Qian Du

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

20 引用（Scopus）

摘要

Feature representation is crucial for hyperspectral image (HSI) classification. However, existing convolutional neural network (CNN)-based methods are limited by the convolution kernel and only focus on local features, which causes it to ignore the global properties of HSIs. Transformer-based networks can make up for the limitations of CNNs because they emphasize the global features of HSIs. How to combine the advantages of these two networks in feature extraction is of great importance in improving classification accuracy. Therefore, a cross-attention fusion network bridging CNN and Transformer (CAF-Former) is proposed, which can fully utilize the advantages of CNN in local features and Transformer's long time-dependent feature learning for hyperspectral classification. In order to fully explore the local and global information within an HSI, a Dynamic-CNN branch is proposed to effectively encode local features of pixels, while a Gaussian Transformer branch is constructed to accurately model the global features and long-range dependencies. Moreover, in order to fully interact with local and global features, a cross-attention fusion (CAF) module is proposed as a bridge to fuse the features extracted by the two branches. Experiments over several benchmark datasets demonstrate that the proposed CAF-Former significantly outperforms both CNN-based and Transformer-based state-of-the-art networks for HSI classification.

源语言	英语
文章编号	5522214
期刊	IEEE Transactions on Geoscience and Remote Sensing
卷	62
DOI	https://doi.org/10.1109/TGRS.2024.3419266
出版状态	已出版 - 2024

访问文件

10.1109/TGRS.2024.3419266

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{2856db83a2164d3e95f3f90048c4e890,

title = "Bridging CNN and Transformer with Cross-Attention Fusion Network for Hyperspectral Image Classification",

abstract = "Feature representation is crucial for hyperspectral image (HSI) classification. However, existing convolutional neural network (CNN)-based methods are limited by the convolution kernel and only focus on local features, which causes it to ignore the global properties of HSIs. Transformer-based networks can make up for the limitations of CNNs because they emphasize the global features of HSIs. How to combine the advantages of these two networks in feature extraction is of great importance in improving classification accuracy. Therefore, a cross-attention fusion network bridging CNN and Transformer (CAF-Former) is proposed, which can fully utilize the advantages of CNN in local features and Transformer's long time-dependent feature learning for hyperspectral classification. In order to fully explore the local and global information within an HSI, a Dynamic-CNN branch is proposed to effectively encode local features of pixels, while a Gaussian Transformer branch is constructed to accurately model the global features and long-range dependencies. Moreover, in order to fully interact with local and global features, a cross-attention fusion (CAF) module is proposed as a bridge to fuse the features extracted by the two branches. Experiments over several benchmark datasets demonstrate that the proposed CAF-Former significantly outperforms both CNN-based and Transformer-based state-of-the-art networks for HSI classification.",

keywords = "Convolutional neural network (CNN), feature fusion, hyperspectral image (HSI) classification, transformer",

author = "Fulin Xu and Shaohui Mei and Ge Zhang and Nan Wang and Qian Du",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2024",

doi = "10.1109/TGRS.2024.3419266",

language = "英语",

volume = "62",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Bridging CNN and Transformer with Cross-Attention Fusion Network for Hyperspectral Image Classification

AU - Xu, Fulin

AU - Mei, Shaohui

AU - Zhang, Ge

AU - Wang, Nan

AU - Du, Qian

PY - 2024

Y1 - 2024

N2 - Feature representation is crucial for hyperspectral image (HSI) classification. However, existing convolutional neural network (CNN)-based methods are limited by the convolution kernel and only focus on local features, which causes it to ignore the global properties of HSIs. Transformer-based networks can make up for the limitations of CNNs because they emphasize the global features of HSIs. How to combine the advantages of these two networks in feature extraction is of great importance in improving classification accuracy. Therefore, a cross-attention fusion network bridging CNN and Transformer (CAF-Former) is proposed, which can fully utilize the advantages of CNN in local features and Transformer's long time-dependent feature learning for hyperspectral classification. In order to fully explore the local and global information within an HSI, a Dynamic-CNN branch is proposed to effectively encode local features of pixels, while a Gaussian Transformer branch is constructed to accurately model the global features and long-range dependencies. Moreover, in order to fully interact with local and global features, a cross-attention fusion (CAF) module is proposed as a bridge to fuse the features extracted by the two branches. Experiments over several benchmark datasets demonstrate that the proposed CAF-Former significantly outperforms both CNN-based and Transformer-based state-of-the-art networks for HSI classification.

AB - Feature representation is crucial for hyperspectral image (HSI) classification. However, existing convolutional neural network (CNN)-based methods are limited by the convolution kernel and only focus on local features, which causes it to ignore the global properties of HSIs. Transformer-based networks can make up for the limitations of CNNs because they emphasize the global features of HSIs. How to combine the advantages of these two networks in feature extraction is of great importance in improving classification accuracy. Therefore, a cross-attention fusion network bridging CNN and Transformer (CAF-Former) is proposed, which can fully utilize the advantages of CNN in local features and Transformer's long time-dependent feature learning for hyperspectral classification. In order to fully explore the local and global information within an HSI, a Dynamic-CNN branch is proposed to effectively encode local features of pixels, while a Gaussian Transformer branch is constructed to accurately model the global features and long-range dependencies. Moreover, in order to fully interact with local and global features, a cross-attention fusion (CAF) module is proposed as a bridge to fuse the features extracted by the two branches. Experiments over several benchmark datasets demonstrate that the proposed CAF-Former significantly outperforms both CNN-based and Transformer-based state-of-the-art networks for HSI classification.

KW - Convolutional neural network (CNN)

KW - feature fusion

KW - hyperspectral image (HSI) classification

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85197077372&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2024.3419266

DO - 10.1109/TGRS.2024.3419266

M3 - 文章

AN - SCOPUS:85197077372

SN - 0196-2892

VL - 62

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5522214

ER -

Bridging CNN and Transformer with Cross-Attention Fusion Network for Hyperspectral Image Classification

摘要

访问文件

其它文件与链接

指纹

引用此