TY - JOUR
T1 - MSSF-Net
T2 - A Multimodal Spectral–Spatial Feature Fusion Network for Hyperspectral Unmixing
AU - Gao, Wei
AU - Zhang, Yu
AU - Akoudad, Youssef
AU - Chen, Jie
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Hyperspectral unmixing (HU) aims to decompose mixed pixels in remote sensing imagery into material-specific spectra and their respective abundance fractions. Recently, autoencoders (AEs) have made significant advances in HU due to their strong representational capabilities and ease of implementation. However, relying exclusively on feature extraction from a single-modality hyperspectral image (HSI) can fail to fully utilize both spatial and spectral information, thereby limiting the ability to distinguish objects in complex scenes. To address these limitations, we propose a multimodal spectral-spatial feature fusion network (MSSF-Net) for enhanced HU. The MSSF-Net adopts a dual-stream architecture to extract feature representations from complementary input modalities. Specifically, the hyperspectral subnetwork leverages a convolutional neural network (CNN) to capture spatial information, while the light detection and ranging (LiDAR) subnetwork incorporates an enhanced channel attention mechanism (ECAM) to capture the dynamic changes in spatial information across different channels. Furthermore, we introduce a cross-modal fusion (CMF) module that integrates spectral and spatial information across modalities, leading to more robust feature representations. Experimental results indicate that the MSSF-Net significantly outperforms existing traditional and deep learning (DL)-based methods in terms of unmixing accuracy.
AB - Hyperspectral unmixing (HU) aims to decompose mixed pixels in remote sensing imagery into material-specific spectra and their respective abundance fractions. Recently, autoencoders (AEs) have made significant advances in HU due to their strong representational capabilities and ease of implementation. However, relying exclusively on feature extraction from a single-modality hyperspectral image (HSI) can fail to fully utilize both spatial and spectral information, thereby limiting the ability to distinguish objects in complex scenes. To address these limitations, we propose a multimodal spectral-spatial feature fusion network (MSSF-Net) for enhanced HU. The MSSF-Net adopts a dual-stream architecture to extract feature representations from complementary input modalities. Specifically, the hyperspectral subnetwork leverages a convolutional neural network (CNN) to capture spatial information, while the light detection and ranging (LiDAR) subnetwork incorporates an enhanced channel attention mechanism (ECAM) to capture the dynamic changes in spatial information across different channels. Furthermore, we introduce a cross-modal fusion (CMF) module that integrates spectral and spatial information across modalities, leading to more robust feature representations. Experimental results indicate that the MSSF-Net significantly outperforms existing traditional and deep learning (DL)-based methods in terms of unmixing accuracy.
KW - Attention
KW - autoencoder (AE)
KW - deep learning (DL)
KW - hyperspectral unmixing (HU)
KW - multimodal remote sensing image (MRSI)
UR - http://www.scopus.com/inward/record.url?scp=105003418434&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2025.3563647
DO - 10.1109/TGRS.2025.3563647
M3 - 文章
AN - SCOPUS:105003418434
SN - 0196-2892
VL - 63
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5511515
ER -