Global–Local Mamba-Based Dual-Modality Fusion for Hyperspectral and LiDAR Data Classification

  • Khanzada Muzammil Hussain
  • , Keyun Zhao
  • , Sachal Pervaiz
  • , Ying Li

Research output: Contribution to journalArticlepeer-review

Abstract

Highlights: What are the main findings? We propose GL-Mamba, a frequency-aware dual-modality fusion network that combines low-/high-frequency decomposition, global–local Mamba blocks, and cross-attention to jointly exploit hyperspectral and LiDAR information for land-cover classification. GL-Mamba achieves state-of-the-art performance on the Trento, Augsburg, and Houston2013 benchmarks, with overall accuracies of 99.71%, 94.58%, and 99.60%, respectively, while producing smoother and more coherent classification maps than recent CNN-, transformer-, and Mamba-based baselines. What are the implications of the main findings? The results demonstrate that linear-complexity Mamba state-space models are a competitive and efficient alternative to heavy transformer architectures for large-scale multimodal remote sensing, enabling accurate HSI–LiDAR fusion under practical computational constraints. The proposed frequency-aware and cross-modal design can be extended to other sensor combinations and tasks (e.g., multispectral–LiDAR mapping, change detection), providing a general blueprint for building scalable and robust multimodal networks in remote sensing applications. Hyperspectral image (HSI) and light detection and ranging (LiDAR) data offer complementary spectral and structural information; however, the integration of these high-dimensional, heterogeneous modalities poses significant challenges. We propose a Global–Local Mamba dual-modality fusion framework (GL-Mamba) for HSI–LiDAR classification. Each sensor’s input is decomposed into low- and high-frequency sub-bands: lightweight 3D/2D CNNs process low-frequency spectral–spatial structures, while compact transformers handle high-frequency details. The outputs are aggregated using a global–local Mamba block, a state-space sequence model that retains local context while capturing long-range dependencies with linear complexity. A cross-attention module aligns spectral and elevation features, yielding a lightweight, efficient architecture that preserves fine textures and coarse structures. Experiments on Trento, Augsburg, and Houston2013 datasets show that GL-Mamba outperforms eight leading baselines in accuracy and kappa coefficient, while maintaining high inference speed due to its dual-frequency design. These results highlight the practicality and accuracy of our model for multimodal remote-sensing applications.

Original languageEnglish
Article number138
JournalRemote Sensing
Volume18
Issue number1
DOIs
StatePublished - Jan 2026

Keywords

  • cross attention
  • deep learning
  • hyperspectral image
  • LiDAR
  • Mamba
  • multimodal fusion
  • remote sensing

Fingerprint

Dive into the research topics of 'Global–Local Mamba-Based Dual-Modality Fusion for Hyperspectral and LiDAR Data Classification'. Together they form a unique fingerprint.

Cite this