Entity-Guided Attention Twisting Network for Referring Remote Sensing Image Segmentation

Yuyu Jia, Qing Zhou, Junyu Gao, Qi Wang

Research output: Contribution to journalArticlepeer-review

Abstract

Referring remote sensing image segmentation (RRSIS) aims to establish pixel-level interpretation of specific regions queried by textual expressions, bridging textual semantics and intelligent analysis of remote sensing imagery. In contrast to natural scenarios, the intricate backgrounds in remote sensing scenarios result in low target–background contrast, often leading to semantic dispersion in segmented regions. Furthermore, conventional cross-attention-based referring image segmentation (RIS) methods struggle to bridge the modal gap, hindering fine-grained alignment between linguistic descriptions and geographical features. To overcome these challenges, we present a pioneering entity-guided attention twisting network (Enti-TwistNet) for RRSIS. Our framework first introduces a segment anything model (SAM)-inspired entity guidance (SEG) module that extracts spatially constrained entity prompts through a self-reasoning mask generation mechanism, constructing a comprehensive entity-visual–text tri-modal information cube. Subsequently, during cross-modal interaction, we propose a dual-phase attention-twisting (DAT) mechanism: 1) initially, sequential channel-wise scanning to facilitate cross-modal semantic propagation (SP) and 2) subsequently, twist attention to the spatial dimension, integrating entity guidance to enhance the representation of irregular geographic boundaries. Extensive experiments on two widely used benchmarks, RefSegRS and RRSIS-D, demonstrate that the Enti-TwistNet achieves significant performance improvements over existing state-of-the-art models.

Original languageEnglish
Article number5645610
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume63
DOIs
StatePublished - 2025

Keywords

  • Attention twisting
  • entity-aware guidance
  • referring segmentation
  • remote sensing

Fingerprint

Dive into the research topics of 'Entity-Guided Attention Twisting Network for Referring Remote Sensing Image Segmentation'. Together they form a unique fingerprint.

Cite this