Abstract
Highlights: What are the main findings? We propose GL-Mamba, a frequency-aware dual-modality fusion network that combines low-/high-frequency decomposition, global–local Mamba blocks, and cross-attention to jointly exploit hyperspectral and LiDAR information for land-cover classification. GL-Mamba achieves state-of-the-art performance on the Trento, Augsburg, and Houston2013 benchmarks, with overall accuracies of 99.71%, 94.58%, and 99.60%, respectively, while producing smoother and more coherent classification maps than recent CNN-, transformer-, and Mamba-based baselines. What are the implications of the main findings? The results demonstrate that linear-complexity Mamba state-space models are a competitive and efficient alternative to heavy transformer architectures for large-scale multimodal remote sensing, enabling accurate HSI–LiDAR fusion under practical computational constraints. The proposed frequency-aware and cross-modal design can be extended to other sensor combinations and tasks (e.g., multispectral–LiDAR mapping, change detection), providing a general blueprint for building scalable and robust multimodal networks in remote sensing applications. Hyperspectral image (HSI) and light detection and ranging (LiDAR) data offer complementary spectral and structural information; however, the integration of these high-dimensional, heterogeneous modalities poses significant challenges. We propose a Global–Local Mamba dual-modality fusion framework (GL-Mamba) for HSI–LiDAR classification. Each sensor’s input is decomposed into low- and high-frequency sub-bands: lightweight 3D/2D CNNs process low-frequency spectral–spatial structures, while compact transformers handle high-frequency details. The outputs are aggregated using a global–local Mamba block, a state-space sequence model that retains local context while capturing long-range dependencies with linear complexity. A cross-attention module aligns spectral and elevation features, yielding a lightweight, efficient architecture that preserves fine textures and coarse structures. Experiments on Trento, Augsburg, and Houston2013 datasets show that GL-Mamba outperforms eight leading baselines in accuracy and kappa coefficient, while maintaining high inference speed due to its dual-frequency design. These results highlight the practicality and accuracy of our model for multimodal remote-sensing applications.
| Original language | English |
|---|---|
| Article number | 138 |
| Journal | Remote Sensing |
| Volume | 18 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jan 2026 |
Keywords
- cross attention
- deep learning
- hyperspectral image
- LiDAR
- Mamba
- multimodal fusion
- remote sensing
Fingerprint
Dive into the research topics of 'Global–Local Mamba-Based Dual-Modality Fusion for Hyperspectral and LiDAR Data Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver