TY - JOUR
T1 - Bridge the Intra-Class Gap
T2 - K-Shot Multi-Scale Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation
AU - Liu, Yuanwei
AU - Liu, Nian
AU - Jiang, Tao
AU - Yao, Xiwen
AU - Anwer, Rao Muhammad
AU - Cholakkal, Hisham
AU - Han, Junwei
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Few-shot segmentation (FSS) aims to accurately segment target objects in a query image using only a limited number of annotated support images. Existing approaches typically follow a paradigm that directly leverages category information from the support set to identify target objects in the query. However, these methods often ignore the category information gap between query and support images, leading to suboptimal performance when faced with images containing objects exhibiting significant intra-class diversity. To address this issue, we propose a novel framework that introduces intermediate prototypes to capture both deterministic information from the support images and adaptive knowledge from the query at multiple scales. Our framework, named the K-shot Multi-scale Intermediate Prototype Mining Transformer (KMIPMT), is based on the Transformer architecture and learns intermediate prototypes in an iterative manner, where each KMIPMT layer propagates category information from both K-shot support features and multi-scale query features to intermediate prototypes. This information is then utilized to activate the query feature map. Through repeated iterations, both intermediate prototypes and the query feature are progressively enhanced, and the final refined query feature is used for generating precise segmentation predictions. Despite its simplicity, our method achieves remarkable performance gains on standard benchmarks, including PASCAL-5i, COCO-20i, and FSS-1000, setting new state-of-the-art results. Furthermore, we explore several practical and challenging extensions of our method, including 3D point cloud FSS, zero-shot segmentation, weak-label FSS, and cross-domain FSS. These extensions showcase the versatility and effectiveness of our proposed KMIPMT framework across different domains and scenarios.
AB - Few-shot segmentation (FSS) aims to accurately segment target objects in a query image using only a limited number of annotated support images. Existing approaches typically follow a paradigm that directly leverages category information from the support set to identify target objects in the query. However, these methods often ignore the category information gap between query and support images, leading to suboptimal performance when faced with images containing objects exhibiting significant intra-class diversity. To address this issue, we propose a novel framework that introduces intermediate prototypes to capture both deterministic information from the support images and adaptive knowledge from the query at multiple scales. Our framework, named the K-shot Multi-scale Intermediate Prototype Mining Transformer (KMIPMT), is based on the Transformer architecture and learns intermediate prototypes in an iterative manner, where each KMIPMT layer propagates category information from both K-shot support features and multi-scale query features to intermediate prototypes. This information is then utilized to activate the query feature map. Through repeated iterations, both intermediate prototypes and the query feature are progressively enhanced, and the final refined query feature is used for generating precise segmentation predictions. Despite its simplicity, our method achieves remarkable performance gains on standard benchmarks, including PASCAL-5i, COCO-20i, and FSS-1000, setting new state-of-the-art results. Furthermore, we explore several practical and challenging extensions of our method, including 3D point cloud FSS, zero-shot segmentation, weak-label FSS, and cross-domain FSS. These extensions showcase the versatility and effectiveness of our proposed KMIPMT framework across different domains and scenarios.
KW - Few-shot
KW - intermediate prototype
KW - semantic segmentation
UR - https://www.scopus.com/pages/publications/105012462129
U2 - 10.1109/TPAMI.2025.3593816
DO - 10.1109/TPAMI.2025.3593816
M3 - 文章
AN - SCOPUS:105012462129
SN - 0162-8828
VL - 47
SP - 11003
EP - 11021
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 12
ER -