Video Polyp Segmentation: A Deep Learning Perspective

Ge Peng Ji; Guobao Xiao; Yu Cheng Chou; Deng Ping Fan; Kai Zhao; Geng Chen; Luc Van Gool

doi:10.1007/s11633-022-1371-y

Video Polyp Segmentation: A Deep Learning Perspective

Ge Peng Ji, Guobao Xiao, Yu Cheng Chou, Deng Ping Fan, Kai Zhao, Geng Chen, Luc Van Gool

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

100 引用（Scopus）

摘要

We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of a large-scale dataset with fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158 690 colonoscopy video frames from the well-known SUN-database. We provide additional annotation covering diverse types, i.e., attribute, object mask, boundary, scribble, and polygon. Second, we design a simple but efficient baseline, named PNS+, which consists of a global encoder, a local encoder, and normalized self-attention (NS) blocks. The global and local encoders receive an anchor frame and multiple successive frames to extract long-term and short-term spatial-temporal representations, which are then progressively refined by two NS blocks. Extensive experiments show that PNS+ achieves the best performance and real-time inference speed (170 fps), making it a promising solution for the VPS task. Third, we extensively evaluate 13 representative polyp/object segmentation models on our SUN-SEG dataset and provide attribute-based comparisons. Finally, we discuss several open issues and suggest possible research directions for the VPS community. Our project and dataset are publicly available at https://github.com/GewelsJI/VPS.

源语言	英语
页（从-至）	531-549
页数	19
期刊	Machine Intelligence Research
卷	19
期	6
DOI	https://doi.org/10.1007/s11633-022-1371-y
出版状态	已出版 - 12月 2022

访问文件

10.1007/s11633-022-1371-y

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{49f9447eb1054ca89843cb9ea93126e7,

title = "Video Polyp Segmentation: A Deep Learning Perspective",

abstract = "We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of a large-scale dataset with fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158 690 colonoscopy video frames from the well-known SUN-database. We provide additional annotation covering diverse types, i.e., attribute, object mask, boundary, scribble, and polygon. Second, we design a simple but efficient baseline, named PNS+, which consists of a global encoder, a local encoder, and normalized self-attention (NS) blocks. The global and local encoders receive an anchor frame and multiple successive frames to extract long-term and short-term spatial-temporal representations, which are then progressively refined by two NS blocks. Extensive experiments show that PNS+ achieves the best performance and real-time inference speed (170 fps), making it a promising solution for the VPS task. Third, we extensively evaluate 13 representative polyp/object segmentation models on our SUN-SEG dataset and provide attribute-based comparisons. Finally, we discuss several open issues and suggest possible research directions for the VPS community. Our project and dataset are publicly available at https://github.com/GewelsJI/VPS.",

keywords = "Video polyp segmentation (VPS), abdomen, colonoscopy, dataset, self-attention",

author = "Ji, {Ge Peng} and Guobao Xiao and Chou, {Yu Cheng} and Fan, {Deng Ping} and Kai Zhao and Geng Chen and {Van Gool}, Luc",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2022",

month = dec,

doi = "10.1007/s11633-022-1371-y",

language = "英语",

volume = "19",

pages = "531--549",

journal = "Machine Intelligence Research",

issn = "2731-538X",

publisher = "Chinese Academy of Sciences",

number = "6",

}

TY - JOUR

T1 - Video Polyp Segmentation

T2 - A Deep Learning Perspective

AU - Ji, Ge Peng

AU - Xiao, Guobao

AU - Chou, Yu Cheng

AU - Fan, Deng Ping

AU - Zhao, Kai

AU - Chen, Geng

AU - Van Gool, Luc

PY - 2022/12

Y1 - 2022/12

N2 - We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of a large-scale dataset with fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158 690 colonoscopy video frames from the well-known SUN-database. We provide additional annotation covering diverse types, i.e., attribute, object mask, boundary, scribble, and polygon. Second, we design a simple but efficient baseline, named PNS+, which consists of a global encoder, a local encoder, and normalized self-attention (NS) blocks. The global and local encoders receive an anchor frame and multiple successive frames to extract long-term and short-term spatial-temporal representations, which are then progressively refined by two NS blocks. Extensive experiments show that PNS+ achieves the best performance and real-time inference speed (170 fps), making it a promising solution for the VPS task. Third, we extensively evaluate 13 representative polyp/object segmentation models on our SUN-SEG dataset and provide attribute-based comparisons. Finally, we discuss several open issues and suggest possible research directions for the VPS community. Our project and dataset are publicly available at https://github.com/GewelsJI/VPS.

AB - We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of a large-scale dataset with fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158 690 colonoscopy video frames from the well-known SUN-database. We provide additional annotation covering diverse types, i.e., attribute, object mask, boundary, scribble, and polygon. Second, we design a simple but efficient baseline, named PNS+, which consists of a global encoder, a local encoder, and normalized self-attention (NS) blocks. The global and local encoders receive an anchor frame and multiple successive frames to extract long-term and short-term spatial-temporal representations, which are then progressively refined by two NS blocks. Extensive experiments show that PNS+ achieves the best performance and real-time inference speed (170 fps), making it a promising solution for the VPS task. Third, we extensively evaluate 13 representative polyp/object segmentation models on our SUN-SEG dataset and provide attribute-based comparisons. Finally, we discuss several open issues and suggest possible research directions for the VPS community. Our project and dataset are publicly available at https://github.com/GewelsJI/VPS.

KW - Video polyp segmentation (VPS)

KW - abdomen

KW - colonoscopy

KW - dataset

KW - self-attention

UR - http://www.scopus.com/inward/record.url?scp=85141421506&partnerID=8YFLogxK

U2 - 10.1007/s11633-022-1371-y

DO - 10.1007/s11633-022-1371-y

M3 - 文章

AN - SCOPUS:85141421506

SN - 2731-538X

VL - 19

SP - 531

EP - 549

JO - Machine Intelligence Research

JF - Machine Intelligence Research

IS - 6

ER -

Video Polyp Segmentation: A Deep Learning Perspective

摘要

访问文件

其它文件与链接

指纹

引用此