One-shot In-context Part Segmentation

Zhenqi Dai; Ting Liu; Xingxing Zhang; Yunchao Wei; Yanning Zhang

doi:10.1145/3664647.3680989

One-shot In-context Part Segmentation

Zhenqi Dai, Ting Liu, Xingxing Zhang, Yunchao Wei, Yanning Zhang

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

In this paper, we present the One-shot In-context Part Segmentation (OIParts) framework, designed to tackle the challenges of part segmentation by leveraging visual foundation models (VFMs). Existing training-based one-shot part segmentation methods that utilize VFMs encounter difficulties when faced with scenarios where the one-shot image and test image exhibit significant variance in appearance and perspective, or when the object in the test image is partially visible. We argue that training on the one-shot example often leads to overfitting, thereby compromising the model's generalization capability. Our framework offers a novel approach to part segmentation that is training-free, flexible, and data-efficient, requiring only a single in-context example for precise segmentation with superior generalization ability. By thoroughly exploring the complementary strengths of VFMs, specifically DINOv2 and Stable Diffusion, we introduce an adaptive channel selection approach by minimizing the intra-class distance for better exploiting these two features, thereby enhancing the discriminatory power of the extracted features for the fine-grained parts. We have achieved remarkable segmentation performance across diverse object categories. The OIParts framework not only eliminates the need for extensive labeled data but also demonstrates superior generalization ability. Through comprehensive experimentation on three benchmark datasets, we have demonstrated the superiority of our proposed method over existing part segmentation approaches in one-shot settings. Code is available at https://github.com/dai647/OIParts.

Original language	English
Title of host publication	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
Publisher	Association for Computing Machinery, Inc
Pages	10966-10975
Number of pages	10
ISBN (Electronic)	9798400706868
DOIs	https://doi.org/10.1145/3664647.3680989
State	Published - 28 Oct 2024
Event	32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, Australia Duration: 28 Oct 2024 → 1 Nov 2024

Publication series

Name	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

Conference

Conference	32nd ACM International Conference on Multimedia, MM 2024
Country/Territory	Australia
City	Melbourne
Period	28/10/24 → 1/11/24

Keywords

one-shot segmentation
part segmentation
semantic segmentation

Access to Document

10.1145/3664647.3680989

Cite this

@inproceedings{ed3aa7e288ac4ac69ea7ece075349323,

title = "One-shot In-context Part Segmentation",

abstract = "In this paper, we present the One-shot In-context Part Segmentation (OIParts) framework, designed to tackle the challenges of part segmentation by leveraging visual foundation models (VFMs). Existing training-based one-shot part segmentation methods that utilize VFMs encounter difficulties when faced with scenarios where the one-shot image and test image exhibit significant variance in appearance and perspective, or when the object in the test image is partially visible. We argue that training on the one-shot example often leads to overfitting, thereby compromising the model's generalization capability. Our framework offers a novel approach to part segmentation that is training-free, flexible, and data-efficient, requiring only a single in-context example for precise segmentation with superior generalization ability. By thoroughly exploring the complementary strengths of VFMs, specifically DINOv2 and Stable Diffusion, we introduce an adaptive channel selection approach by minimizing the intra-class distance for better exploiting these two features, thereby enhancing the discriminatory power of the extracted features for the fine-grained parts. We have achieved remarkable segmentation performance across diverse object categories. The OIParts framework not only eliminates the need for extensive labeled data but also demonstrates superior generalization ability. Through comprehensive experimentation on three benchmark datasets, we have demonstrated the superiority of our proposed method over existing part segmentation approaches in one-shot settings. Code is available at https://github.com/dai647/OIParts.",

keywords = "one-shot segmentation, part segmentation, semantic segmentation",

author = "Zhenqi Dai and Ting Liu and Xingxing Zhang and Yunchao Wei and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2024 ACM.; 32nd ACM International Conference on Multimedia, MM 2024 ; Conference date: 28-10-2024 Through 01-11-2024",

year = "2024",

month = oct,

day = "28",

doi = "10.1145/3664647.3680989",

language = "英语",

series = "MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia",

publisher = "Association for Computing Machinery, Inc",

pages = "10966--10975",

booktitle = "MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia",

}

Dai, Z, Liu, T, Zhang, X, Wei, Y & Zhang, Y 2024, One-shot In-context Part Segmentation. in MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia. MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, Association for Computing Machinery, Inc, pp. 10966-10975, 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, Australia, 28/10/24. https://doi.org/10.1145/3664647.3680989

One-shot In-context Part Segmentation. / Dai, Zhenqi; Liu, Ting; Zhang, Xingxing et al.
MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia. Association for Computing Machinery, Inc, 2024. p. 10966-10975 (MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - One-shot In-context Part Segmentation

AU - Dai, Zhenqi

AU - Liu, Ting

AU - Zhang, Xingxing

AU - Wei, Yunchao

AU - Zhang, Yanning

PY - 2024/10/28

Y1 - 2024/10/28

N2 - In this paper, we present the One-shot In-context Part Segmentation (OIParts) framework, designed to tackle the challenges of part segmentation by leveraging visual foundation models (VFMs). Existing training-based one-shot part segmentation methods that utilize VFMs encounter difficulties when faced with scenarios where the one-shot image and test image exhibit significant variance in appearance and perspective, or when the object in the test image is partially visible. We argue that training on the one-shot example often leads to overfitting, thereby compromising the model's generalization capability. Our framework offers a novel approach to part segmentation that is training-free, flexible, and data-efficient, requiring only a single in-context example for precise segmentation with superior generalization ability. By thoroughly exploring the complementary strengths of VFMs, specifically DINOv2 and Stable Diffusion, we introduce an adaptive channel selection approach by minimizing the intra-class distance for better exploiting these two features, thereby enhancing the discriminatory power of the extracted features for the fine-grained parts. We have achieved remarkable segmentation performance across diverse object categories. The OIParts framework not only eliminates the need for extensive labeled data but also demonstrates superior generalization ability. Through comprehensive experimentation on three benchmark datasets, we have demonstrated the superiority of our proposed method over existing part segmentation approaches in one-shot settings. Code is available at https://github.com/dai647/OIParts.

AB - In this paper, we present the One-shot In-context Part Segmentation (OIParts) framework, designed to tackle the challenges of part segmentation by leveraging visual foundation models (VFMs). Existing training-based one-shot part segmentation methods that utilize VFMs encounter difficulties when faced with scenarios where the one-shot image and test image exhibit significant variance in appearance and perspective, or when the object in the test image is partially visible. We argue that training on the one-shot example often leads to overfitting, thereby compromising the model's generalization capability. Our framework offers a novel approach to part segmentation that is training-free, flexible, and data-efficient, requiring only a single in-context example for precise segmentation with superior generalization ability. By thoroughly exploring the complementary strengths of VFMs, specifically DINOv2 and Stable Diffusion, we introduce an adaptive channel selection approach by minimizing the intra-class distance for better exploiting these two features, thereby enhancing the discriminatory power of the extracted features for the fine-grained parts. We have achieved remarkable segmentation performance across diverse object categories. The OIParts framework not only eliminates the need for extensive labeled data but also demonstrates superior generalization ability. Through comprehensive experimentation on three benchmark datasets, we have demonstrated the superiority of our proposed method over existing part segmentation approaches in one-shot settings. Code is available at https://github.com/dai647/OIParts.

KW - one-shot segmentation

KW - part segmentation

KW - semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85209777099&partnerID=8YFLogxK

U2 - 10.1145/3664647.3680989

DO - 10.1145/3664647.3680989

M3 - 会议稿件

AN - SCOPUS:85209777099

T3 - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

SP - 10966

EP - 10975

BT - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

PB - Association for Computing Machinery, Inc

T2 - 32nd ACM International Conference on Multimedia, MM 2024

Y2 - 28 October 2024 through 1 November 2024

ER -

One-shot In-context Part Segmentation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this