Advanced Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection

Long Li; Huichao Xie; Nian Liu; Dingwen Zhang; Rao Muhammad Anwer; Hisham Cholakkal; Junwei Han

doi:10.1109/TPAMI.2025.3573054

Advanced Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection

Long Li, Huichao Xie, Nian Liu, Dingwen Zhang, Rao Muhammad Anwer, Hisham Cholakkal, Junwei Han

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Most existing CoSOD models focus solely on extracting co-saliency cues while neglecting explicit exploration of background regions, potentially leading to difficulties in handling interference from complex background areas. To address this, this paper proposes a Discriminative co-saliency and background Mining Transformer framework (DMT) to explicitly mine both co-saliency and background information and effectively model their discriminability. DMT first learns two types of tokens by disjointly extracting co-saliency and background information from segmentation features, then performs discriminability within the segmentation features guided by these well-learned tokens. In the first phase, we propose economic multi-grained correlation modules for efficient detection information extraction, including Region-to-Region (R2R), Contrast-induced Pixel-to-Token (CtP2T), and Co-saliency Token-to-Token (CoT2T) correlation modules. In the subsequent phase, we introduce Token-Guided Feature Refinement (TGFR) modules to enhance discriminability within the segmentation features. To further enhance the discriminative modeling and practicality of DMT, we first upgrade the original TGFR's intra-image modeling approach to an intra-group one, thus proposing Group TGFR (G-TGFR), which is more suitable for the co-saliency task. Subsequently, we designed a Noise Propagation Suppression (NPS) mechanism to apply our model to a more practical open-world scenario, ultimately presenting our extended version, i.e., DMT+O. Extensive experimental results on both conventional CoSOD and open-world CoSOD benchmark datasets demonstrate the effectiveness of our proposed model.

源语言	英语
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
DOI	https://doi.org/10.1109/TPAMI.2025.3573054
出版状态	已接受/待刊 - 2025

访问文件

10.1109/TPAMI.2025.3573054

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{3cd66cccf9304ea99c07b8b7396e5985,

title = "Advanced Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection",

abstract = "Most existing CoSOD models focus solely on extracting co-saliency cues while neglecting explicit exploration of background regions, potentially leading to difficulties in handling interference from complex background areas. To address this, this paper proposes a Discriminative co-saliency and background Mining Transformer framework (DMT) to explicitly mine both co-saliency and background information and effectively model their discriminability. DMT first learns two types of tokens by disjointly extracting co-saliency and background information from segmentation features, then performs discriminability within the segmentation features guided by these well-learned tokens. In the first phase, we propose economic multi-grained correlation modules for efficient detection information extraction, including Region-to-Region (R2R), Contrast-induced Pixel-to-Token (CtP2T), and Co-saliency Token-to-Token (CoT2T) correlation modules. In the subsequent phase, we introduce Token-Guided Feature Refinement (TGFR) modules to enhance discriminability within the segmentation features. To further enhance the discriminative modeling and practicality of DMT, we first upgrade the original TGFR's intra-image modeling approach to an intra-group one, thus proposing Group TGFR (G-TGFR), which is more suitable for the co-saliency task. Subsequently, we designed a Noise Propagation Suppression (NPS) mechanism to apply our model to a more practical open-world scenario, ultimately presenting our extended version, i.e., DMT+O. Extensive experimental results on both conventional CoSOD and open-world CoSOD benchmark datasets demonstrate the effectiveness of our proposed model.",

keywords = "Co-salient object detection, Discriminability modeling, Multi-grained correlations, Open-world visual recognition, Transformer",

author = "Long Li and Huichao Xie and Nian Liu and Dingwen Zhang and Anwer, {Rao Muhammad} and Hisham Cholakkal and Junwei Han",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2025",

doi = "10.1109/TPAMI.2025.3573054",

language = "英语",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Advanced Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection

AU - Li, Long

AU - Xie, Huichao

AU - Liu, Nian

AU - Zhang, Dingwen

AU - Anwer, Rao Muhammad

AU - Cholakkal, Hisham

AU - Han, Junwei

PY - 2025

Y1 - 2025

N2 - Most existing CoSOD models focus solely on extracting co-saliency cues while neglecting explicit exploration of background regions, potentially leading to difficulties in handling interference from complex background areas. To address this, this paper proposes a Discriminative co-saliency and background Mining Transformer framework (DMT) to explicitly mine both co-saliency and background information and effectively model their discriminability. DMT first learns two types of tokens by disjointly extracting co-saliency and background information from segmentation features, then performs discriminability within the segmentation features guided by these well-learned tokens. In the first phase, we propose economic multi-grained correlation modules for efficient detection information extraction, including Region-to-Region (R2R), Contrast-induced Pixel-to-Token (CtP2T), and Co-saliency Token-to-Token (CoT2T) correlation modules. In the subsequent phase, we introduce Token-Guided Feature Refinement (TGFR) modules to enhance discriminability within the segmentation features. To further enhance the discriminative modeling and practicality of DMT, we first upgrade the original TGFR's intra-image modeling approach to an intra-group one, thus proposing Group TGFR (G-TGFR), which is more suitable for the co-saliency task. Subsequently, we designed a Noise Propagation Suppression (NPS) mechanism to apply our model to a more practical open-world scenario, ultimately presenting our extended version, i.e., DMT+O. Extensive experimental results on both conventional CoSOD and open-world CoSOD benchmark datasets demonstrate the effectiveness of our proposed model.

AB - Most existing CoSOD models focus solely on extracting co-saliency cues while neglecting explicit exploration of background regions, potentially leading to difficulties in handling interference from complex background areas. To address this, this paper proposes a Discriminative co-saliency and background Mining Transformer framework (DMT) to explicitly mine both co-saliency and background information and effectively model their discriminability. DMT first learns two types of tokens by disjointly extracting co-saliency and background information from segmentation features, then performs discriminability within the segmentation features guided by these well-learned tokens. In the first phase, we propose economic multi-grained correlation modules for efficient detection information extraction, including Region-to-Region (R2R), Contrast-induced Pixel-to-Token (CtP2T), and Co-saliency Token-to-Token (CoT2T) correlation modules. In the subsequent phase, we introduce Token-Guided Feature Refinement (TGFR) modules to enhance discriminability within the segmentation features. To further enhance the discriminative modeling and practicality of DMT, we first upgrade the original TGFR's intra-image modeling approach to an intra-group one, thus proposing Group TGFR (G-TGFR), which is more suitable for the co-saliency task. Subsequently, we designed a Noise Propagation Suppression (NPS) mechanism to apply our model to a more practical open-world scenario, ultimately presenting our extended version, i.e., DMT+O. Extensive experimental results on both conventional CoSOD and open-world CoSOD benchmark datasets demonstrate the effectiveness of our proposed model.

KW - Co-salient object detection

KW - Discriminability modeling

KW - Multi-grained correlations

KW - Open-world visual recognition

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=105006742431&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2025.3573054

DO - 10.1109/TPAMI.2025.3573054

M3 - 文章

AN - SCOPUS:105006742431

SN - 0162-8828

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

ER -

Advanced Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection

摘要

访问文件

其它文件与链接

指纹

引用此