跳到主要导航 跳到搜索 跳到主要内容

Hierarchical textual-visual guidance for referring remote sensing segmentation

  • Northwestern Polytechnical University Xian

科研成果: 期刊稿件文章同行评审

摘要

Referring Remote Sensing Image Segmentation (RRSIS) aims to precisely segment regions in remote sensing images based on natural language expressions. However, a central challenge lies in language-visual ambiguity, as remote sensing expressions often involve property-dense functional categories and implicit spatial relations, while the corresponding images simultaneously present substantial scale variation and intricate spatial layouts. Existing methods struggle to effectively ground complex textual semantics within intricate remote sensing images. To address this challenge, we propose a method from the perspective of hierarchical textual-visual guidance. Specifically, we design a Textual Semantic Parsing Module (TSPM), which disambiguates complex referring expressions by transforming them into hierarchical attributes encompassing category recognition, spatial constraints, relational semantics, and intrinsic properties, thereby providing explicit cues for visual grounding. Building upon these structured cues, we further develop an Adaptive Visual-aware Modulation Module (AVMM), which integrates Dual-Path hierarchical Visual Feature Extraction and Dynamic Convolutional Perception Mechanism to adaptively modulate features under the hierarchical textual guidance from TSPM. Through the joint effect of TSPM and AVMM, our approach effectively bridges the gap caused by language-visual ambiguity. The proposed method is evaluated on two public RRSIS datasets, achieving state-of-the-art performance with mIoU scores of 68.81% on RefSegRS and 64.82% on RRSIS-D.

源语言英语
文章编号113579
期刊Pattern Recognition
179
DOI
出版状态已出版 - 11月 2026

指纹

探究 'Hierarchical textual-visual guidance for referring remote sensing segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此