TY - JOUR
T1 - HMF-Former
T2 - Spatio-Spectral Transformer for Hyperspectral and Multispectral Image Fusion
AU - You, Tengfei
AU - Wu, Chanyue
AU - Bai, Yunpeng
AU - Wang, Dong
AU - Ge, Huibin
AU - Li, Ying
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2023
Y1 - 2023
N2 - The key to hyperspectral image (HSI) and multispectral image (MSI) fusion is to take advantage of the properties of interspectra self-similarities of HSIs and spatial correlations of MSIs. However, leading convolutional neural network (CNN)-based methods show shortcomings in capturing long-range dependencies and self-similarity prior. To this end, we propose a simple yet efficient Transformer-based network, hyperspectral and multispectral image fusion (HMF)-Former, for the HSI/MSI fusion. The HMF-Former adopts a U-shaped architecture with a spatio-spectral Transformer block (SSTB) as the basic unit. In the SSTB, embedded spatial-wise multihead self-attention (Spa-MSA) and spectral-wise multihead self-attention (Spe-MSA) effectively capture interactions of spatial regions and interspectra dependencies, respectively. They are consistent with the properties of spatial correlations of MSIs and interspectra self-similarities of HSIs. In addition, specially designed SSTB enables the HMF-Former to capture both local and global features while maintaining linear complexity. Extensive experiments on four benchmark datasets show that our method significantly outperforms state-of-the-art methods.
AB - The key to hyperspectral image (HSI) and multispectral image (MSI) fusion is to take advantage of the properties of interspectra self-similarities of HSIs and spatial correlations of MSIs. However, leading convolutional neural network (CNN)-based methods show shortcomings in capturing long-range dependencies and self-similarity prior. To this end, we propose a simple yet efficient Transformer-based network, hyperspectral and multispectral image fusion (HMF)-Former, for the HSI/MSI fusion. The HMF-Former adopts a U-shaped architecture with a spatio-spectral Transformer block (SSTB) as the basic unit. In the SSTB, embedded spatial-wise multihead self-attention (Spa-MSA) and spectral-wise multihead self-attention (Spe-MSA) effectively capture interactions of spatial regions and interspectra dependencies, respectively. They are consistent with the properties of spatial correlations of MSIs and interspectra self-similarities of HSIs. In addition, specially designed SSTB enables the HMF-Former to capture both local and global features while maintaining linear complexity. Extensive experiments on four benchmark datasets show that our method significantly outperforms state-of-the-art methods.
KW - Hyperspectral image (HSI) and multispectral image (MSI) fusion
KW - multihead self-attention (MSA)
KW - remote sensing
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85144786324&partnerID=8YFLogxK
U2 - 10.1109/LGRS.2022.3229692
DO - 10.1109/LGRS.2022.3229692
M3 - 文章
AN - SCOPUS:85144786324
SN - 1545-598X
VL - 20
JO - IEEE Geoscience and Remote Sensing Letters
JF - IEEE Geoscience and Remote Sensing Letters
M1 - 5500505
ER -