Abstract
Referring Remote Sensing Image Segmentation (RRSIS) aims to precisely segment regions in remote sensing images based on natural language expressions. However, a central challenge lies in language-visual ambiguity, as remote sensing expressions often involve property-dense functional categories and implicit spatial relations, while the corresponding images simultaneously present substantial scale variation and intricate spatial layouts. Existing methods struggle to effectively ground complex textual semantics within intricate remote sensing images. To address this challenge, we propose a method from the perspective of hierarchical textual-visual guidance. Specifically, we design a Textual Semantic Parsing Module (TSPM), which disambiguates complex referring expressions by transforming them into hierarchical attributes encompassing category recognition, spatial constraints, relational semantics, and intrinsic properties, thereby providing explicit cues for visual grounding. Building upon these structured cues, we further develop an Adaptive Visual-aware Modulation Module (AVMM), which integrates Dual-Path hierarchical Visual Feature Extraction and Dynamic Convolutional Perception Mechanism to adaptively modulate features under the hierarchical textual guidance from TSPM. Through the joint effect of TSPM and AVMM, our approach effectively bridges the gap caused by language-visual ambiguity. The proposed method is evaluated on two public RRSIS datasets, achieving state-of-the-art performance with mIoU scores of 68.81% on RefSegRS and 64.82% on RRSIS-D.
| Original language | English |
|---|---|
| Article number | 113579 |
| Journal | Pattern Recognition |
| Volume | 179 |
| DOIs | |
| State | Published - Nov 2026 |
Keywords
- Hierarchical textual-visual guidance
- Referring image segmentation
- Remote sensing
Fingerprint
Dive into the research topics of 'Hierarchical textual-visual guidance for referring remote sensing segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver