跳到主要导航 跳到搜索 跳到主要内容

Entity-Guided Attention Twisting Network for Referring Remote Sensing Image Segmentation

  • Northwestern Polytechnical University Xian

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

Referring remote sensing image segmentation (RRSIS) aims to establish pixel-level interpretation of specific regions queried by textual expressions, bridging textual semantics and intelligent analysis of remote sensing imagery. In contrast to natural scenarios, the intricate backgrounds in remote sensing scenarios result in low target–background contrast, often leading to semantic dispersion in segmented regions. Furthermore, conventional cross-attention-based referring image segmentation (RIS) methods struggle to bridge the modal gap, hindering fine-grained alignment between linguistic descriptions and geographical features. To overcome these challenges, we present a pioneering entity-guided attention twisting network (Enti-TwistNet) for RRSIS. Our framework first introduces a segment anything model (SAM)-inspired entity guidance (SEG) module that extracts spatially constrained entity prompts through a self-reasoning mask generation mechanism, constructing a comprehensive entity-visual–text tri-modal information cube. Subsequently, during cross-modal interaction, we propose a dual-phase attention-twisting (DAT) mechanism: 1) initially, sequential channel-wise scanning to facilitate cross-modal semantic propagation (SP) and 2) subsequently, twist attention to the spatial dimension, integrating entity guidance to enhance the representation of irregular geographic boundaries. Extensive experiments on two widely used benchmarks, RefSegRS and RRSIS-D, demonstrate that the Enti-TwistNet achieves significant performance improvements over existing state-of-the-art models.

源语言英语
文章编号5645610
期刊IEEE Transactions on Geoscience and Remote Sensing
63
DOI
出版状态已出版 - 2025

指纹

探究 'Entity-Guided Attention Twisting Network for Referring Remote Sensing Image Segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此