InterMamba: A Visual-Prompted Interactive Framework for Dense Object Detection and Annotation

Shanji Liu, Zhigang Yang, Qiang Li, Qi Wang

Research output: Contribution to journalArticlepeer-review

Abstract

Existing object detection methods is constrained by the high annotation costs, particularly in remote sensing due to the diversity of targets and the large scale of data. Visual-Prompted Interactive Object Detection can enhance the efficiency of data annotation by leveraging user-provided visual prompts to iteratively refine detection results. However, current interactive annotation frameworks are hindered by their reliance on simple feature fusion strategies, which limit their ability to capture fine-grained semantic relationships. Moreover, more advanced fusion methods face computational complexity challenges, making them unsuitable for high-resolution feature spaces commonly encountered in remote sensing imagery. To address these limitations, we propose InterMamba, an efficient framework for interactive object detection in remote sensing images. InterMamba integrates the VMamba backbone and a novel Cross Vision Selective Scan Module (Cross-VSSM) to achieve linear-complexity multi-scale feature fusion, reducing memory consumption while capturing fine-grained details in high-resolution feature spaces. To further enhance interaction flexibility and detection precision, a hybrid Gaussian heatmap generation method is proposed to encodes user-provided point and bounding box annotations. Meanwhile, a User Interaction Loss function further optimizes detection accuracy in dense scenarios by aligning localization and classification with user guidance. Our experiments demonstrate that InterMamba consistently outperforms existing methods in mean Average Precision (mAP). In terms of enhancing precision and reducing annotation costs, InterMamba establishes a robust solution for interactive remote sensing object detection.

Original languageEnglish
JournalIEEE Transactions on Geoscience and Remote Sensing
DOIs
StateAccepted/In press - 2025

Keywords

  • Cross vision selective scan
  • Remote sensing
  • Visual-prompted interactive object detection
  • VMamba

Fingerprint

Dive into the research topics of 'InterMamba: A Visual-Prompted Interactive Framework for Dense Object Detection and Annotation'. Together they form a unique fingerprint.

Cite this