TY - JOUR
T1 - InterMamba
T2 - A Visual-Prompted Interactive Framework for Dense Object Detection and Annotation
AU - Liu, Shanji
AU - Yang, Zhigang
AU - Li, Qiang
AU - Wang, Qi
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Existing object detection methods is constrained by the high annotation costs, particularly in remote sensing due to the diversity of targets and the large scale of data. Visual-Prompted Interactive Object Detection can enhance the efficiency of data annotation by leveraging user-provided visual prompts to iteratively refine detection results. However, current interactive annotation frameworks are hindered by their reliance on simple feature fusion strategies, which limit their ability to capture fine-grained semantic relationships. Moreover, more advanced fusion methods face computational complexity challenges, making them unsuitable for high-resolution feature spaces commonly encountered in remote sensing imagery. To address these limitations, we propose InterMamba, an efficient framework for interactive object detection in remote sensing images. InterMamba integrates the VMamba backbone and a novel Cross Vision Selective Scan Module (Cross-VSSM) to achieve linear-complexity multi-scale feature fusion, reducing memory consumption while capturing fine-grained details in high-resolution feature spaces. To further enhance interaction flexibility and detection precision, a hybrid Gaussian heatmap generation method is proposed to encodes user-provided point and bounding box annotations. Meanwhile, a User Interaction Loss function further optimizes detection accuracy in dense scenarios by aligning localization and classification with user guidance. Our experiments demonstrate that InterMamba consistently outperforms existing methods in mean Average Precision (mAP). In terms of enhancing precision and reducing annotation costs, InterMamba establishes a robust solution for interactive remote sensing object detection.
AB - Existing object detection methods is constrained by the high annotation costs, particularly in remote sensing due to the diversity of targets and the large scale of data. Visual-Prompted Interactive Object Detection can enhance the efficiency of data annotation by leveraging user-provided visual prompts to iteratively refine detection results. However, current interactive annotation frameworks are hindered by their reliance on simple feature fusion strategies, which limit their ability to capture fine-grained semantic relationships. Moreover, more advanced fusion methods face computational complexity challenges, making them unsuitable for high-resolution feature spaces commonly encountered in remote sensing imagery. To address these limitations, we propose InterMamba, an efficient framework for interactive object detection in remote sensing images. InterMamba integrates the VMamba backbone and a novel Cross Vision Selective Scan Module (Cross-VSSM) to achieve linear-complexity multi-scale feature fusion, reducing memory consumption while capturing fine-grained details in high-resolution feature spaces. To further enhance interaction flexibility and detection precision, a hybrid Gaussian heatmap generation method is proposed to encodes user-provided point and bounding box annotations. Meanwhile, a User Interaction Loss function further optimizes detection accuracy in dense scenarios by aligning localization and classification with user guidance. Our experiments demonstrate that InterMamba consistently outperforms existing methods in mean Average Precision (mAP). In terms of enhancing precision and reducing annotation costs, InterMamba establishes a robust solution for interactive remote sensing object detection.
KW - Cross vision selective scan
KW - Remote sensing
KW - Visual-prompted interactive object detection
KW - VMamba
UR - http://www.scopus.com/inward/record.url?scp=105002685417&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2025.3559798
DO - 10.1109/TGRS.2025.3559798
M3 - 文章
AN - SCOPUS:105002685417
SN - 0196-2892
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
ER -