AutoRoadSAM: Multimodal Remote Sensing Road Extraction with Structure-Semantic Awareness via Auto-Prompting Vision Foundation Models

  • Jiayuan Li
  • , Zhen Wang
  • , Xiao Sun
  • , Zhiyong Lv
  • , Nan Xu
  • , Zhuhong You
  • , De Shuang Huang

Research output: Contribution to journalArticlepeer-review

Abstract

The integration of multimodal data holds great promise for advancing road extraction in remote sensing. However, existing approaches are limited by the lack of unified end-to-end frameworks for diverse modality combinations, suboptimal multimodal feature fusion, and challenges in capturing the slender, winding, and complex topological structures of roads. In this article, we propose AutoRoadSAM, a novel end-to-end framework for multimodal road extraction that fully exploits the powerful visual representation capabilities of the segment anything model (SAM) and, for the first time, introduces an auto-prompting mechanism via a dynamic snake convolution-based decoder. This decoder adaptively generates task-specific prompts by capturing fine-grained local geometric features from auxiliary modality branches, enabling precise alignment with complex road structures. To further enhance multimodal feature fusion and topological perception, we design the cross-modal information interaction (CMII) module, which facilitates global context modeling and cross-modal interaction, while strengthening the representation of intricate road topology through multidirectional snake scanning. Moreover, we incorporate a mask decoder with cross-polarity-aware linear attention (CPLAM) to boost decoding efficiency and effectively address pixel imbalance. Together, these innovations enable AutoRoadSAM to achieve superior structure- and semantic-aware road extraction across diverse modality combinations. Extensive experiments on six public datasets and four modality combinations demonstrate that AutoRoadSAM consistently outperforms state-of-the-art methods, validating the effectiveness and generalization capability of each proposed component. The code is available at https://github.com/NWPUFranklee/AutoRoadSAM.git.

Original languageEnglish
Article number5607617
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume64
DOIs
StatePublished - 2026

Keywords

  • Auto-prompting
  • feature fusion
  • multimodal remote sensing
  • road extraction
  • vision foundation models

Fingerprint

Dive into the research topics of 'AutoRoadSAM: Multimodal Remote Sensing Road Extraction with Structure-Semantic Awareness via Auto-Prompting Vision Foundation Models'. Together they form a unique fingerprint.

Cite this