跳到主要导航 跳到搜索 跳到主要内容

AutoSAM: Auto-Prompting Mamba-Based Vision Foundation Model for Multimodal Remote Sensing Semantic Segmentation

  • Jiayuan Li
  • , Zhen Wang
  • , Xiao Sun
  • , Nan Xu
  • , Zhuhong You
  • , Deshuang Huang
  • Northwestern Polytechnical University Xian
  • Xijing University
  • Hohai University
  • Guangxi Academy of Agricultural Sciences

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

Vision foundation models, such as the segment anything model (SAM), have advanced remote sensing (RS) tasks. However, extending SAM to multimodal RS semantic segmentation faces two key challenges: 1) SAM is tailored for unimodal inputs and lacks RS-specific knowledge, hindering effective spatial modeling and cross-modal feature integration; and 2) SAM depends on externally provided prompts (e.g., points or boxes), limiting its scalability and practicality in multimodal scenarios. To address these issues, we present AutoSAM, an end-to-end auto-prompting Mamba-based vision foundation model framework tailored for multimodal RS semantic segmentation. Specifically, we introduce a CrossMamba block (CMB) in the feature extraction stage to replace the conventional multihead self-attention mechanism, where the core reverse interactive scanning adaptor-SS2D module (RISASM) promotes semantic interaction and alleviates modality discrepancies. In addition, a multimodal scale-aware fusion module (MSAFM) is incorporated to enhance scale-aware fusion and suppress irrelevant features through cascaded residual interactions. Furthermore, we propose a plug-and-play multimodal mixture-of-class-expert auto-prompting module (MMoEAPM), which enables the generation of pseudo-mask prompts for the original prompt encoder without additional training overhead, thereby supporting efficient auto-prompting. Extensive experiments and ablation studies on four benchmark multimodal RS datasets demonstrate that AutoSAM consistently achieves state-of-the-art performance across diverse modality combinations.

源语言英语
文章编号5612421
期刊IEEE Transactions on Geoscience and Remote Sensing
64
DOI
出版状态已出版 - 2026

指纹

探究 'AutoSAM: Auto-Prompting Mamba-Based Vision Foundation Model for Multimodal Remote Sensing Semantic Segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此