MSSF-Net: A Multimodal Spectral–Spatial Feature Fusion Network for Hyperspectral Unmixing

Wei Gao; Yu Zhang; Youssef Akoudad; Jie Chen

doi:10.1109/TGRS.2025.3563647

MSSF-Net: A Multimodal Spectral–Spatial Feature Fusion Network for Hyperspectral Unmixing

Wei Gao, Yu Zhang, Youssef Akoudad, Jie Chen

航海学院

Jiangsu University

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Hyperspectral unmixing (HU) aims to decompose mixed pixels in remote sensing imagery into material-specific spectra and their respective abundance fractions. Recently, autoencoders (AEs) have made significant advances in HU due to their strong representational capabilities and ease of implementation. However, relying exclusively on feature extraction from a single-modality hyperspectral image (HSI) can fail to fully utilize both spatial and spectral information, thereby limiting the ability to distinguish objects in complex scenes. To address these limitations, we propose a multimodal spectral-spatial feature fusion network (MSSF-Net) for enhanced HU. The MSSF-Net adopts a dual-stream architecture to extract feature representations from complementary input modalities. Specifically, the hyperspectral subnetwork leverages a convolutional neural network (CNN) to capture spatial information, while the light detection and ranging (LiDAR) subnetwork incorporates an enhanced channel attention mechanism (ECAM) to capture the dynamic changes in spatial information across different channels. Furthermore, we introduce a cross-modal fusion (CMF) module that integrates spectral and spatial information across modalities, leading to more robust feature representations. Experimental results indicate that the MSSF-Net significantly outperforms existing traditional and deep learning (DL)-based methods in terms of unmixing accuracy.

源语言	英语
文章编号	5511515
期刊	IEEE Transactions on Geoscience and Remote Sensing
卷	63
DOI	https://doi.org/10.1109/TGRS.2025.3563647
出版状态	已出版 - 2025

访问文件

10.1109/TGRS.2025.3563647

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{3874a369f7144357be08ee865bdff79d,

title = "MSSF-Net: A Multimodal Spectral–Spatial Feature Fusion Network for Hyperspectral Unmixing",

abstract = "Hyperspectral unmixing (HU) aims to decompose mixed pixels in remote sensing imagery into material-specific spectra and their respective abundance fractions. Recently, autoencoders (AEs) have made significant advances in HU due to their strong representational capabilities and ease of implementation. However, relying exclusively on feature extraction from a single-modality hyperspectral image (HSI) can fail to fully utilize both spatial and spectral information, thereby limiting the ability to distinguish objects in complex scenes. To address these limitations, we propose a multimodal spectral-spatial feature fusion network (MSSF-Net) for enhanced HU. The MSSF-Net adopts a dual-stream architecture to extract feature representations from complementary input modalities. Specifically, the hyperspectral subnetwork leverages a convolutional neural network (CNN) to capture spatial information, while the light detection and ranging (LiDAR) subnetwork incorporates an enhanced channel attention mechanism (ECAM) to capture the dynamic changes in spatial information across different channels. Furthermore, we introduce a cross-modal fusion (CMF) module that integrates spectral and spatial information across modalities, leading to more robust feature representations. Experimental results indicate that the MSSF-Net significantly outperforms existing traditional and deep learning (DL)-based methods in terms of unmixing accuracy.",

keywords = "Attention, autoencoder (AE), deep learning (DL), hyperspectral unmixing (HU), multimodal remote sensing image (MRSI)",

author = "Wei Gao and Yu Zhang and Youssef Akoudad and Jie Chen",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2025",

doi = "10.1109/TGRS.2025.3563647",

language = "英语",

volume = "63",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - MSSF-Net

T2 - A Multimodal Spectral–Spatial Feature Fusion Network for Hyperspectral Unmixing

AU - Gao, Wei

AU - Zhang, Yu

AU - Akoudad, Youssef

AU - Chen, Jie

PY - 2025

Y1 - 2025

N2 - Hyperspectral unmixing (HU) aims to decompose mixed pixels in remote sensing imagery into material-specific spectra and their respective abundance fractions. Recently, autoencoders (AEs) have made significant advances in HU due to their strong representational capabilities and ease of implementation. However, relying exclusively on feature extraction from a single-modality hyperspectral image (HSI) can fail to fully utilize both spatial and spectral information, thereby limiting the ability to distinguish objects in complex scenes. To address these limitations, we propose a multimodal spectral-spatial feature fusion network (MSSF-Net) for enhanced HU. The MSSF-Net adopts a dual-stream architecture to extract feature representations from complementary input modalities. Specifically, the hyperspectral subnetwork leverages a convolutional neural network (CNN) to capture spatial information, while the light detection and ranging (LiDAR) subnetwork incorporates an enhanced channel attention mechanism (ECAM) to capture the dynamic changes in spatial information across different channels. Furthermore, we introduce a cross-modal fusion (CMF) module that integrates spectral and spatial information across modalities, leading to more robust feature representations. Experimental results indicate that the MSSF-Net significantly outperforms existing traditional and deep learning (DL)-based methods in terms of unmixing accuracy.

AB - Hyperspectral unmixing (HU) aims to decompose mixed pixels in remote sensing imagery into material-specific spectra and their respective abundance fractions. Recently, autoencoders (AEs) have made significant advances in HU due to their strong representational capabilities and ease of implementation. However, relying exclusively on feature extraction from a single-modality hyperspectral image (HSI) can fail to fully utilize both spatial and spectral information, thereby limiting the ability to distinguish objects in complex scenes. To address these limitations, we propose a multimodal spectral-spatial feature fusion network (MSSF-Net) for enhanced HU. The MSSF-Net adopts a dual-stream architecture to extract feature representations from complementary input modalities. Specifically, the hyperspectral subnetwork leverages a convolutional neural network (CNN) to capture spatial information, while the light detection and ranging (LiDAR) subnetwork incorporates an enhanced channel attention mechanism (ECAM) to capture the dynamic changes in spatial information across different channels. Furthermore, we introduce a cross-modal fusion (CMF) module that integrates spectral and spatial information across modalities, leading to more robust feature representations. Experimental results indicate that the MSSF-Net significantly outperforms existing traditional and deep learning (DL)-based methods in terms of unmixing accuracy.

KW - Attention

KW - autoencoder (AE)

KW - deep learning (DL)

KW - hyperspectral unmixing (HU)

KW - multimodal remote sensing image (MRSI)

UR - http://www.scopus.com/inward/record.url?scp=105003418434&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2025.3563647

DO - 10.1109/TGRS.2025.3563647

M3 - 文章

AN - SCOPUS:105003418434

SN - 0196-2892

VL - 63

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5511515

ER -

MSSF-Net: A Multimodal Spectral–Spatial Feature Fusion Network for Hyperspectral Unmixing

摘要

访问文件

其它文件与链接

指纹

引用此