TY - JOUR
T1 - Fine-Tuning SAM for Forward-Looking Sonar with Collaborative Prompts and Embedding
AU - Li, Jiayuan
AU - Wang, Zhen
AU - Xu, Nan
AU - You, Zhuhong
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - The Segment Anything Model (SAM) represents a significant advancement in semantic segmentation, particularly for natural images, but encounters notable limitations when applied to forward-looking sonar (FLS) images. The primary challenges lie in the inherent boundary ambiguity of FLS images, which complicates the use of prompt strategies for accurate boundary delineation, and the lack of effective interaction between prompts and image features. In this letter, we introduce a collaborative prompting strategy to address these issues by generating dense prompt embeddings and sonar tokens that focus on contour and boundary features, thereby replacing the original dense prompt embedding and IoU token. To further enhance segmentation, we employ embedding compensation techniques based on Mamba and KAN, which increase boundary information to image embedings and improve the fusion of prompts within image embeddings. We conducted comprehensive experiments, including comparative analyses and ablation studies, to validate the superiority of our proposed approach. Results show that our method significantly improves segmentation performance for FLS images, effectively addressing boundary ambiguity and optimizing prompt utilization.
AB - The Segment Anything Model (SAM) represents a significant advancement in semantic segmentation, particularly for natural images, but encounters notable limitations when applied to forward-looking sonar (FLS) images. The primary challenges lie in the inherent boundary ambiguity of FLS images, which complicates the use of prompt strategies for accurate boundary delineation, and the lack of effective interaction between prompts and image features. In this letter, we introduce a collaborative prompting strategy to address these issues by generating dense prompt embeddings and sonar tokens that focus on contour and boundary features, thereby replacing the original dense prompt embedding and IoU token. To further enhance segmentation, we employ embedding compensation techniques based on Mamba and KAN, which increase boundary information to image embedings and improve the fusion of prompts within image embeddings. We conducted comprehensive experiments, including comparative analyses and ablation studies, to validate the superiority of our proposed approach. Results show that our method significantly improves segmentation performance for FLS images, effectively addressing boundary ambiguity and optimizing prompt utilization.
KW - Semantic segmentation
KW - collaborative prompting
KW - embedding compensation
KW - forward-looking sonar (FLS)
KW - multimodal remote sensing
UR - http://www.scopus.com/inward/record.url?scp=105002850971&partnerID=8YFLogxK
U2 - 10.1109/LGRS.2025.3562182
DO - 10.1109/LGRS.2025.3562182
M3 - 文章
AN - SCOPUS:105002850971
SN - 1545-598X
JO - IEEE Geoscience and Remote Sensing Letters
JF - IEEE Geoscience and Remote Sensing Letters
ER -