NTRENet++: Unleashing the Power of Non-Target Knowledge for Few-Shot Semantic Segmentation

Yuanwei Liu; Nian Liu; Yi Wu; Hisham Cholakkal; Rao Muhammad Anwer; Xiwen Yao; Junwei Han

doi:10.1109/TCSVT.2024.3519573

NTRENet++: Unleashing the Power of Non-Target Knowledge for Few-Shot Semantic Segmentation

Yuanwei Liu, Nian Liu, Yi Wu, Hisham Cholakkal, Rao Muhammad Anwer, Xiwen Yao, Junwei Han

School of Automation

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Few-shot semantic segmentation (FSS) aims to segment the target object under the condition of a few annotated samples. However, current studies on FSS primarily concentrate on extracting information related to the object, resulting in inadequate identification of ambiguous regions, particularly in non-target areas, including the background (BG) and Distracting Objects (DOs). Intuitively, to alleviate this problem, we propose a novel framework, namely NTRENet++, to explicitly mine and eliminate BG and DO regions in the query. First, we introduce a BG Mining Module (BGMM) to extract BG information and generate a comprehensive BG prototype from all images. For this purpose, a BG mining loss is formulated to supervise the learning of BGMM, utilizing only the known target object segmentation ground truth. Subsequently, based on this BG prototype, we employ a BG Eliminating Module to filter out the BG information from the query and obtain a BG-free result. Following this, the target information is utilized in the target matching module to generate the initial segmentation result. Finally, a DO Eliminating Module is proposed to further mine and eliminate DO regions, based on which we can obtain a BG and DO-free target object segmentation result. Moreover, we present a prototypical-pixel contrastive learning algorithm to enhance the model’s capability to differentiate the target object from DOs. Extensive experiments conducted on both PASCAL-5ⁱ and COCO-20ⁱ datasets demonstrate the effectiveness of our approach despite its simplicity. Additionally, we extend our method to the few-shot video object segmentation task and achieve improved performance on a baseline model, demonstrating its generalization ability.

Original language	English
Pages (from-to)	4314-4328
Number of pages	15
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	35
Issue number	5
DOIs	https://doi.org/10.1109/TCSVT.2024.3519573
State	Published - 2025

Keywords

Few-shot learning
few-shot segmentation
semantic segmentation
video object segmentation

Access to Document

10.1109/TCSVT.2024.3519573

Cite this

@article{ea1814a6150b409aa4f33d72c4469971,

title = "NTRENet++: Unleashing the Power of Non-Target Knowledge for Few-Shot Semantic Segmentation",

abstract = "Few-shot semantic segmentation (FSS) aims to segment the target object under the condition of a few annotated samples. However, current studies on FSS primarily concentrate on extracting information related to the object, resulting in inadequate identification of ambiguous regions, particularly in non-target areas, including the background (BG) and Distracting Objects (DOs). Intuitively, to alleviate this problem, we propose a novel framework, namely NTRENet++, to explicitly mine and eliminate BG and DO regions in the query. First, we introduce a BG Mining Module (BGMM) to extract BG information and generate a comprehensive BG prototype from all images. For this purpose, a BG mining loss is formulated to supervise the learning of BGMM, utilizing only the known target object segmentation ground truth. Subsequently, based on this BG prototype, we employ a BG Eliminating Module to filter out the BG information from the query and obtain a BG-free result. Following this, the target information is utilized in the target matching module to generate the initial segmentation result. Finally, a DO Eliminating Module is proposed to further mine and eliminate DO regions, based on which we can obtain a BG and DO-free target object segmentation result. Moreover, we present a prototypical-pixel contrastive learning algorithm to enhance the model{\textquoteright}s capability to differentiate the target object from DOs. Extensive experiments conducted on both PASCAL-5i and COCO-20i datasets demonstrate the effectiveness of our approach despite its simplicity. Additionally, we extend our method to the few-shot video object segmentation task and achieve improved performance on a baseline model, demonstrating its generalization ability.",

keywords = "Few-shot learning, few-shot segmentation, semantic segmentation, video object segmentation",

author = "Yuanwei Liu and Nian Liu and Yi Wu and Hisham Cholakkal and {Muhammad Anwer}, Rao and Xiwen Yao and Junwei Han",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2025",

doi = "10.1109/TCSVT.2024.3519573",

language = "英语",

volume = "35",

pages = "4314--4328",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "5",

}

TY - JOUR

T1 - NTRENet++

T2 - Unleashing the Power of Non-Target Knowledge for Few-Shot Semantic Segmentation

AU - Liu, Yuanwei

AU - Liu, Nian

AU - Wu, Yi

AU - Cholakkal, Hisham

AU - Muhammad Anwer, Rao

AU - Yao, Xiwen

AU - Han, Junwei

PY - 2025

Y1 - 2025

N2 - Few-shot semantic segmentation (FSS) aims to segment the target object under the condition of a few annotated samples. However, current studies on FSS primarily concentrate on extracting information related to the object, resulting in inadequate identification of ambiguous regions, particularly in non-target areas, including the background (BG) and Distracting Objects (DOs). Intuitively, to alleviate this problem, we propose a novel framework, namely NTRENet++, to explicitly mine and eliminate BG and DO regions in the query. First, we introduce a BG Mining Module (BGMM) to extract BG information and generate a comprehensive BG prototype from all images. For this purpose, a BG mining loss is formulated to supervise the learning of BGMM, utilizing only the known target object segmentation ground truth. Subsequently, based on this BG prototype, we employ a BG Eliminating Module to filter out the BG information from the query and obtain a BG-free result. Following this, the target information is utilized in the target matching module to generate the initial segmentation result. Finally, a DO Eliminating Module is proposed to further mine and eliminate DO regions, based on which we can obtain a BG and DO-free target object segmentation result. Moreover, we present a prototypical-pixel contrastive learning algorithm to enhance the model’s capability to differentiate the target object from DOs. Extensive experiments conducted on both PASCAL-5i and COCO-20i datasets demonstrate the effectiveness of our approach despite its simplicity. Additionally, we extend our method to the few-shot video object segmentation task and achieve improved performance on a baseline model, demonstrating its generalization ability.

AB - Few-shot semantic segmentation (FSS) aims to segment the target object under the condition of a few annotated samples. However, current studies on FSS primarily concentrate on extracting information related to the object, resulting in inadequate identification of ambiguous regions, particularly in non-target areas, including the background (BG) and Distracting Objects (DOs). Intuitively, to alleviate this problem, we propose a novel framework, namely NTRENet++, to explicitly mine and eliminate BG and DO regions in the query. First, we introduce a BG Mining Module (BGMM) to extract BG information and generate a comprehensive BG prototype from all images. For this purpose, a BG mining loss is formulated to supervise the learning of BGMM, utilizing only the known target object segmentation ground truth. Subsequently, based on this BG prototype, we employ a BG Eliminating Module to filter out the BG information from the query and obtain a BG-free result. Following this, the target information is utilized in the target matching module to generate the initial segmentation result. Finally, a DO Eliminating Module is proposed to further mine and eliminate DO regions, based on which we can obtain a BG and DO-free target object segmentation result. Moreover, we present a prototypical-pixel contrastive learning algorithm to enhance the model’s capability to differentiate the target object from DOs. Extensive experiments conducted on both PASCAL-5i and COCO-20i datasets demonstrate the effectiveness of our approach despite its simplicity. Additionally, we extend our method to the few-shot video object segmentation task and achieve improved performance on a baseline model, demonstrating its generalization ability.

KW - Few-shot learning

KW - few-shot segmentation

KW - semantic segmentation

KW - video object segmentation

UR - http://www.scopus.com/inward/record.url?scp=85212786847&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2024.3519573

DO - 10.1109/TCSVT.2024.3519573

M3 - 文章

AN - SCOPUS:85212786847

SN - 1051-8215

VL - 35

SP - 4314

EP - 4328

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 5

ER -

NTRENet++: Unleashing the Power of Non-Target Knowledge for Few-Shot Semantic Segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this