TY - JOUR
T1 - Few-Shot Segmentation via Divide-and-Conquer Proxies
AU - Lang, Chunbo
AU - Cheng, Gong
AU - Tu, Binfei
AU - Han, Junwei
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2024/1
Y1 - 2024/1
N2 - Few-Shot segmentation (FSS) is a marginally explored but challenging task that aims to identify unseen classes of objects with only a handful of densely annotated samples. By and large, current FSS approaches perform meta-inference based on the prototype learning paradigm, which fails to fully exploit the underlying information from support image-mask pairs, resulting in multiple segmentation failures, such as incomplete objects, ambiguous boundaries, and distractor activation. For this purpose, a flexible and generic framework is developed in the spirit of divide-and-conquer. We first implement a novel self-reasoning scheme on the labeled support image, and then divide the coarse segmentation mask into several regions with different properties. By employing effective masked average pooling techniques, a series of support-induced proxies are generated on the fly, each performing a specific role in conquering the above challenges. Furthermore, we meticulously devise the parallel decoder structure and semantic consistency regularization to eliminate confusion and enhance discrimination. In stark contrast to conventional prototype-based approaches, our proposed divide-and-conquer proxies (DCP) can provide “episode” level guidelines that go well beyond the object cues themselves. Extensive experiments are conducted on FSS benchmarks to verify the effectiveness, including standard settings as well as cross-domain settings. In particular, we propose a temporal DCP and successfully extend it to video object segmentation via memory repository and progressive propagation, illustrating the high scalability. The source codes are available at https://github.com/chunbolang/DCP .
AB - Few-Shot segmentation (FSS) is a marginally explored but challenging task that aims to identify unseen classes of objects with only a handful of densely annotated samples. By and large, current FSS approaches perform meta-inference based on the prototype learning paradigm, which fails to fully exploit the underlying information from support image-mask pairs, resulting in multiple segmentation failures, such as incomplete objects, ambiguous boundaries, and distractor activation. For this purpose, a flexible and generic framework is developed in the spirit of divide-and-conquer. We first implement a novel self-reasoning scheme on the labeled support image, and then divide the coarse segmentation mask into several regions with different properties. By employing effective masked average pooling techniques, a series of support-induced proxies are generated on the fly, each performing a specific role in conquering the above challenges. Furthermore, we meticulously devise the parallel decoder structure and semantic consistency regularization to eliminate confusion and enhance discrimination. In stark contrast to conventional prototype-based approaches, our proposed divide-and-conquer proxies (DCP) can provide “episode” level guidelines that go well beyond the object cues themselves. Extensive experiments are conducted on FSS benchmarks to verify the effectiveness, including standard settings as well as cross-domain settings. In particular, we propose a temporal DCP and successfully extend it to video object segmentation via memory repository and progressive propagation, illustrating the high scalability. The source codes are available at https://github.com/chunbolang/DCP .
KW - Few-Shot learning
KW - Few-Shot segmentation
KW - Prototype learning
KW - Semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85168868841&partnerID=8YFLogxK
U2 - 10.1007/s11263-023-01886-8
DO - 10.1007/s11263-023-01886-8
M3 - 文章
AN - SCOPUS:85168868841
SN - 0920-5691
VL - 132
SP - 261
EP - 283
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 1
ER -