Bridge the Intra-Class Gap: K-Shot Multi-Scale Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation

Research output: Contribution to journalArticlepeer-review

Abstract

Few-shot segmentation (FSS) aims to accurately segment target objects in a query image using only a limited number of annotated support images. Existing approaches typically follow a paradigm that directly leverages category information from the support set to identify target objects in the query. However, these methods often ignore the category information gap between query and support images, leading to suboptimal performance when faced with images containing objects exhibiting significant intra-class diversity. To address this issue, we propose a novel framework that introduces intermediate prototypes to capture both deterministic information from the support images and adaptive knowledge from the query at multiple scales. Our framework, named the K-shot Multi-scale Intermediate Prototype Mining Transformer (KMIPMT), is based on the Transformer architecture and learns intermediate prototypes in an iterative manner, where each KMIPMT layer propagates category information from both K-shot support features and multi-scale query features to intermediate prototypes. This information is then utilized to activate the query feature map. Through repeated iterations, both intermediate prototypes and the query feature are progressively enhanced, and the final refined query feature is used for generating precise segmentation predictions. Despite its simplicity, our method achieves remarkable performance gains on standard benchmarks, including PASCAL-5i, COCO-20i, and FSS-1000, setting new state-of-the-art results. Furthermore, we explore several practical and challenging extensions of our method, including 3D point cloud FSS, zero-shot segmentation, weak-label FSS, and cross-domain FSS. These extensions showcase the versatility and effectiveness of our proposed KMIPMT framework across different domains and scenarios.

Original languageEnglish
Pages (from-to)11003-11021
Number of pages19
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume47
Issue number12
DOIs
StatePublished - 2025

Keywords

  • Few-shot
  • intermediate prototype
  • semantic segmentation

Fingerprint

Dive into the research topics of 'Bridge the Intra-Class Gap: K-Shot Multi-Scale Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation'. Together they form a unique fingerprint.

Cite this