Fine-Tuning SAM for Forward-Looking Sonar with Collaborative Prompts and Embedding

Jiayuan Li; Zhen Wang; Nan Xu; Zhuhong You

doi:10.1109/LGRS.2025.3562182

Fine-Tuning SAM for Forward-Looking Sonar with Collaborative Prompts and Embedding

Jiayuan Li, Zhen Wang, Nan Xu, Zhuhong You

School of Computer Science

Research output: Contribution to journal › Article › peer-review

Abstract

The Segment Anything Model (SAM) represents a significant advancement in semantic segmentation, particularly for natural images, but encounters notable limitations when applied to forward-looking sonar (FLS) images. The primary challenges lie in the inherent boundary ambiguity of FLS images, which complicates the use of prompt strategies for accurate boundary delineation, and the lack of effective interaction between prompts and image features. In this letter, we introduce a collaborative prompting strategy to address these issues by generating dense prompt embeddings and sonar tokens that focus on contour and boundary features, thereby replacing the original dense prompt embedding and IoU token. To further enhance segmentation, we employ embedding compensation techniques based on Mamba and KAN, which increase boundary information to image embedings and improve the fusion of prompts within image embeddings. We conducted comprehensive experiments, including comparative analyses and ablation studies, to validate the superiority of our proposed approach. Results show that our method significantly improves segmentation performance for FLS images, effectively addressing boundary ambiguity and optimizing prompt utilization.

Original language	English
Journal	IEEE Geoscience and Remote Sensing Letters
DOIs	https://doi.org/10.1109/LGRS.2025.3562182
State	Accepted/In press - 2025

Keywords

Semantic segmentation
collaborative prompting
embedding compensation
forward-looking sonar (FLS)
multimodal remote sensing

Access to Document

10.1109/LGRS.2025.3562182

Cite this

@article{e1b474c0a120436cbc0011fcaeb1d95b,

title = "Fine-Tuning SAM for Forward-Looking Sonar with Collaborative Prompts and Embedding",

abstract = "The Segment Anything Model (SAM) represents a significant advancement in semantic segmentation, particularly for natural images, but encounters notable limitations when applied to forward-looking sonar (FLS) images. The primary challenges lie in the inherent boundary ambiguity of FLS images, which complicates the use of prompt strategies for accurate boundary delineation, and the lack of effective interaction between prompts and image features. In this letter, we introduce a collaborative prompting strategy to address these issues by generating dense prompt embeddings and sonar tokens that focus on contour and boundary features, thereby replacing the original dense prompt embedding and IoU token. To further enhance segmentation, we employ embedding compensation techniques based on Mamba and KAN, which increase boundary information to image embedings and improve the fusion of prompts within image embeddings. We conducted comprehensive experiments, including comparative analyses and ablation studies, to validate the superiority of our proposed approach. Results show that our method significantly improves segmentation performance for FLS images, effectively addressing boundary ambiguity and optimizing prompt utilization.",

keywords = "Semantic segmentation, collaborative prompting, embedding compensation, forward-looking sonar (FLS), multimodal remote sensing",

author = "Jiayuan Li and Zhen Wang and Nan Xu and Zhuhong You",

note = "Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2025",

doi = "10.1109/LGRS.2025.3562182",

language = "英语",

journal = "IEEE Geoscience and Remote Sensing Letters",

issn = "1545-598X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Fine-Tuning SAM for Forward-Looking Sonar with Collaborative Prompts and Embedding

AU - Li, Jiayuan

AU - Wang, Zhen

AU - Xu, Nan

AU - You, Zhuhong

PY - 2025

Y1 - 2025

N2 - The Segment Anything Model (SAM) represents a significant advancement in semantic segmentation, particularly for natural images, but encounters notable limitations when applied to forward-looking sonar (FLS) images. The primary challenges lie in the inherent boundary ambiguity of FLS images, which complicates the use of prompt strategies for accurate boundary delineation, and the lack of effective interaction between prompts and image features. In this letter, we introduce a collaborative prompting strategy to address these issues by generating dense prompt embeddings and sonar tokens that focus on contour and boundary features, thereby replacing the original dense prompt embedding and IoU token. To further enhance segmentation, we employ embedding compensation techniques based on Mamba and KAN, which increase boundary information to image embedings and improve the fusion of prompts within image embeddings. We conducted comprehensive experiments, including comparative analyses and ablation studies, to validate the superiority of our proposed approach. Results show that our method significantly improves segmentation performance for FLS images, effectively addressing boundary ambiguity and optimizing prompt utilization.

AB - The Segment Anything Model (SAM) represents a significant advancement in semantic segmentation, particularly for natural images, but encounters notable limitations when applied to forward-looking sonar (FLS) images. The primary challenges lie in the inherent boundary ambiguity of FLS images, which complicates the use of prompt strategies for accurate boundary delineation, and the lack of effective interaction between prompts and image features. In this letter, we introduce a collaborative prompting strategy to address these issues by generating dense prompt embeddings and sonar tokens that focus on contour and boundary features, thereby replacing the original dense prompt embedding and IoU token. To further enhance segmentation, we employ embedding compensation techniques based on Mamba and KAN, which increase boundary information to image embedings and improve the fusion of prompts within image embeddings. We conducted comprehensive experiments, including comparative analyses and ablation studies, to validate the superiority of our proposed approach. Results show that our method significantly improves segmentation performance for FLS images, effectively addressing boundary ambiguity and optimizing prompt utilization.

KW - Semantic segmentation

KW - collaborative prompting

KW - embedding compensation

KW - forward-looking sonar (FLS)

KW - multimodal remote sensing

UR - http://www.scopus.com/inward/record.url?scp=105002850971&partnerID=8YFLogxK

U2 - 10.1109/LGRS.2025.3562182

DO - 10.1109/LGRS.2025.3562182

M3 - 文章

AN - SCOPUS:105002850971

SN - 1545-598X

JO - IEEE Geoscience and Remote Sensing Letters

JF - IEEE Geoscience and Remote Sensing Letters

ER -

Fine-Tuning SAM for Forward-Looking Sonar with Collaborative Prompts and Embedding

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this