SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

Wenxi Yue; Jing Zhang; Kun Hu; Yong Xia; Jiebo Luo; Zhiyong Wang

doi:10.1609/aaai.v38i7.28514

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

24 引用（Scopus）

摘要

The Segment Anything Model (SAM) is a powerful foundation model that has revolutionised image segmentation. To apply SAM to surgical instrument segmentation, a common approach is to locate precise points or boxes of instruments and then use them as prompts for SAM in a zero-shot manner. However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline. To address these problems, we introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes and eliminates the use of explicit prompts for improved robustness and a simpler pipeline. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning, further enhancing the discrimination of the class prototypes for more accurate class prompting. The results of extensive experiments on both EndoVis2018 and EndoVis2017 datasets demonstrate that SurgicalSAM achieves state-of-the-art performance while only requiring a small number of tunable parameters. The source code is available at https://github.com/wenxi-yue/SurgicalSAM.

源语言	英语
主期刊名	Technical Tracks 14
编辑	Michael Wooldridge, Jennifer Dy, Sriraam Natarajan
出版商	Association for the Advancement of Artificial Intelligence
页	6890-6898
页数	9
版本	7
ISBN（电子版）	1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879
DOI	https://doi.org/10.1609/aaai.v38i7.28514
出版状态	已出版 - 25 3月 2024
活动	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, 加拿大期限: 20 2月 2024 → 27 2月 2024

出版系列

姓名	Proceedings of the AAAI Conference on Artificial Intelligence
编号	7
卷	38
ISSN（印刷版）	2159-5399
ISSN（电子版）	2374-3468

会议

会议	38th AAAI Conference on Artificial Intelligence, AAAI 2024
国家/地区	加拿大
市	Vancouver
时期	20/02/24 → 27/02/24

访问文件

10.1609/aaai.v38i7.28514

其它文件与链接

链接到 Scopus 的出版物

引用此

Yue, W., Zhang, J., Hu, K., Xia, Y., Luo, J., & Wang, Z. (2024). SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation. 在 M. Wooldridge, J. Dy, & S. Natarajan (编辑), Technical Tracks 14 (7 编辑, 页码 6890-6898). (Proceedings of the AAAI Conference on Artificial Intelligence; 卷 38, 号码 7). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i7.28514

@inproceedings{167afc00a173400c8de8bbbe5e11d5e2,

title = "SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation",

abstract = "The Segment Anything Model (SAM) is a powerful foundation model that has revolutionised image segmentation. To apply SAM to surgical instrument segmentation, a common approach is to locate precise points or boxes of instruments and then use them as prompts for SAM in a zero-shot manner. However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline. To address these problems, we introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes and eliminates the use of explicit prompts for improved robustness and a simpler pipeline. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning, further enhancing the discrimination of the class prototypes for more accurate class prompting. The results of extensive experiments on both EndoVis2018 and EndoVis2017 datasets demonstrate that SurgicalSAM achieves state-of-the-art performance while only requiring a small number of tunable parameters. The source code is available at https://github.com/wenxi-yue/SurgicalSAM.",

author = "Wenxi Yue and Jing Zhang and Kun Hu and Yong Xia and Jiebo Luo and Zhiyong Wang",

note = "Publisher Copyright: Copyright {\textcopyright} 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i7.28514",

language = "英语",

series = "Proceedings of the AAAI Conference on Artificial Intelligence",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "7",

pages = "6890--6898",

editor = "Michael Wooldridge and Jennifer Dy and Sriraam Natarajan",

booktitle = "Technical Tracks 14",

edition = "7",

}

Yue, W, Zhang, J, Hu, K, Xia, Y, Luo, J & Wang, Z 2024, SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation. 在 M Wooldridge, J Dy & S Natarajan (编辑), Technical Tracks 14. 7 编辑, Proceedings of the AAAI Conference on Artificial Intelligence, 号码 7, 卷 38, Association for the Advancement of Artificial Intelligence, 页码 6890-6898, 38th AAAI Conference on Artificial Intelligence, AAAI 2024, Vancouver, 加拿大, 20/02/24. https://doi.org/10.1609/aaai.v38i7.28514

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation. / Yue, Wenxi; Zhang, Jing; Hu, Kun 等.
Technical Tracks 14. 编辑 / Michael Wooldridge; Jennifer Dy; Sriraam Natarajan. 7. 编辑 Association for the Advancement of Artificial Intelligence, 2024. 页码 6890-6898 (Proceedings of the AAAI Conference on Artificial Intelligence; 卷 38, 号码 7).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - SurgicalSAM

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

AU - Yue, Wenxi

AU - Zhang, Jing

AU - Hu, Kun

AU - Xia, Yong

AU - Luo, Jiebo

AU - Wang, Zhiyong

PY - 2024/3/25

Y1 - 2024/3/25

N2 - The Segment Anything Model (SAM) is a powerful foundation model that has revolutionised image segmentation. To apply SAM to surgical instrument segmentation, a common approach is to locate precise points or boxes of instruments and then use them as prompts for SAM in a zero-shot manner. However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline. To address these problems, we introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes and eliminates the use of explicit prompts for improved robustness and a simpler pipeline. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning, further enhancing the discrimination of the class prototypes for more accurate class prompting. The results of extensive experiments on both EndoVis2018 and EndoVis2017 datasets demonstrate that SurgicalSAM achieves state-of-the-art performance while only requiring a small number of tunable parameters. The source code is available at https://github.com/wenxi-yue/SurgicalSAM.

AB - The Segment Anything Model (SAM) is a powerful foundation model that has revolutionised image segmentation. To apply SAM to surgical instrument segmentation, a common approach is to locate precise points or boxes of instruments and then use them as prompts for SAM in a zero-shot manner. However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline. To address these problems, we introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes and eliminates the use of explicit prompts for improved robustness and a simpler pipeline. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning, further enhancing the discrimination of the class prototypes for more accurate class prompting. The results of extensive experiments on both EndoVis2018 and EndoVis2017 datasets demonstrate that SurgicalSAM achieves state-of-the-art performance while only requiring a small number of tunable parameters. The source code is available at https://github.com/wenxi-yue/SurgicalSAM.

UR - http://www.scopus.com/inward/record.url?scp=85186722649&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i7.28514

DO - 10.1609/aaai.v38i7.28514

M3 - 会议稿件

AN - SCOPUS:85186722649

T3 - Proceedings of the AAAI Conference on Artificial Intelligence

SP - 6890

EP - 6898

BT - Technical Tracks 14

A2 - Wooldridge, Michael

A2 - Dy, Jennifer

A2 - Natarajan, Sriraam

PB - Association for the Advancement of Artificial Intelligence

Y2 - 20 February 2024 through 27 February 2024

ER -

Yue W, Zhang J, Hu K, Xia Y, Luo J, Wang Z. SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation. 在 Wooldridge M, Dy J, Natarajan S, 编辑, Technical Tracks 14. 7 编辑 Association for the Advancement of Artificial Intelligence. 2024. 页码 6890-6898. (Proceedings of the AAAI Conference on Artificial Intelligence; 7). doi: 10.1609/aaai.v38i7.28514

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此