Deep unsupervised part-whole relational visual saliency

Yi Liu; Xiaohui Dong; Dingwen Zhang; Shoukun Xu

doi:10.1016/j.neucom.2023.126916

Deep unsupervised part-whole relational visual saliency

Yi Liu, Xiaohui Dong, Dingwen Zhang, Shoukun Xu

School of Automation

Changzhou University

Research output: Contribution to journal › Article › peer-review

56 Scopus citations

Abstract

Deep Supervised Salient Object Detection (SSOD) excessively relies on large-scale annotated pixel-level labels which consume intensive labour acquiring high quality labels. In such precondition, deep Unsupervised Salient Object Detection (USOD) draws public attention. Under the framework of the existing deep USOD methods, they mostly generate pseudo labels by fusing several hand-crafted detectors’ results. On top of that, a Fully Convolutional Network (FCN) will be trained to detect salient regions separately. While the existing USOD methods have achieved some progress, there are still challenges for them towards satisfactory performance on the complex scene, including (1) poor object wholeness owing to neglecting the hierarchy of those salient regions; (2) unsatisfactory pseudo labels causing by unprimitive fusion of hand-crafted results. To address these issues, in this paper, we introduce the property of part-whole relations endowed by a Belief Capsule Network (BCNet) for deep USOD, which is achieved by a multi-stream capsule routing strategy with a belief score for each stream within the CapsNets architecture. To train BCNet well, we generate high-quality pseudo labels from multiple hand-crafted detectors by developing a consistency-aware fusion strategy. Concretely, a weeding out criterion is first defined to filter out unreliable training samples based on the inter-method consistency among four hand-crafted saliency maps. In the following, a dynamic fusion mechanism is designed to generate high-quality pseudo labels from the remaining samples for BCNet training. Experiments on five public datasets illustrate the superiority of the proposed method. Codes have been released on: https://github.com/Mirlongue/Deep-Unsupervised-Part-Whole-Relational-Visual-Saliency.

Original language	English
Article number	126916
Journal	Neurocomputing
Volume	563
DOIs	https://doi.org/10.1016/j.neucom.2023.126916
State	Published - 1 Jan 2024

Keywords

Consistency-aware fusion strategy
Part-object relationship
Unsupervised salient object detection

Access to Document

10.1016/j.neucom.2023.126916

Cite this

@article{731680c6793847bb81532dd2a08944e1,

title = "Deep unsupervised part-whole relational visual saliency",

abstract = "Deep Supervised Salient Object Detection (SSOD) excessively relies on large-scale annotated pixel-level labels which consume intensive labour acquiring high quality labels. In such precondition, deep Unsupervised Salient Object Detection (USOD) draws public attention. Under the framework of the existing deep USOD methods, they mostly generate pseudo labels by fusing several hand-crafted detectors{\textquoteright} results. On top of that, a Fully Convolutional Network (FCN) will be trained to detect salient regions separately. While the existing USOD methods have achieved some progress, there are still challenges for them towards satisfactory performance on the complex scene, including (1) poor object wholeness owing to neglecting the hierarchy of those salient regions; (2) unsatisfactory pseudo labels causing by unprimitive fusion of hand-crafted results. To address these issues, in this paper, we introduce the property of part-whole relations endowed by a Belief Capsule Network (BCNet) for deep USOD, which is achieved by a multi-stream capsule routing strategy with a belief score for each stream within the CapsNets architecture. To train BCNet well, we generate high-quality pseudo labels from multiple hand-crafted detectors by developing a consistency-aware fusion strategy. Concretely, a weeding out criterion is first defined to filter out unreliable training samples based on the inter-method consistency among four hand-crafted saliency maps. In the following, a dynamic fusion mechanism is designed to generate high-quality pseudo labels from the remaining samples for BCNet training. Experiments on five public datasets illustrate the superiority of the proposed method. Codes have been released on: https://github.com/Mirlongue/Deep-Unsupervised-Part-Whole-Relational-Visual-Saliency.",

keywords = "Consistency-aware fusion strategy, Part-object relationship, Unsupervised salient object detection",

author = "Yi Liu and Xiaohui Dong and Dingwen Zhang and Shoukun Xu",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier B.V.",

year = "2024",

month = jan,

day = "1",

doi = "10.1016/j.neucom.2023.126916",

language = "英语",

volume = "563",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Deep unsupervised part-whole relational visual saliency

AU - Liu, Yi

AU - Dong, Xiaohui

AU - Zhang, Dingwen

AU - Xu, Shoukun

PY - 2024/1/1

Y1 - 2024/1/1

N2 - Deep Supervised Salient Object Detection (SSOD) excessively relies on large-scale annotated pixel-level labels which consume intensive labour acquiring high quality labels. In such precondition, deep Unsupervised Salient Object Detection (USOD) draws public attention. Under the framework of the existing deep USOD methods, they mostly generate pseudo labels by fusing several hand-crafted detectors’ results. On top of that, a Fully Convolutional Network (FCN) will be trained to detect salient regions separately. While the existing USOD methods have achieved some progress, there are still challenges for them towards satisfactory performance on the complex scene, including (1) poor object wholeness owing to neglecting the hierarchy of those salient regions; (2) unsatisfactory pseudo labels causing by unprimitive fusion of hand-crafted results. To address these issues, in this paper, we introduce the property of part-whole relations endowed by a Belief Capsule Network (BCNet) for deep USOD, which is achieved by a multi-stream capsule routing strategy with a belief score for each stream within the CapsNets architecture. To train BCNet well, we generate high-quality pseudo labels from multiple hand-crafted detectors by developing a consistency-aware fusion strategy. Concretely, a weeding out criterion is first defined to filter out unreliable training samples based on the inter-method consistency among four hand-crafted saliency maps. In the following, a dynamic fusion mechanism is designed to generate high-quality pseudo labels from the remaining samples for BCNet training. Experiments on five public datasets illustrate the superiority of the proposed method. Codes have been released on: https://github.com/Mirlongue/Deep-Unsupervised-Part-Whole-Relational-Visual-Saliency.

AB - Deep Supervised Salient Object Detection (SSOD) excessively relies on large-scale annotated pixel-level labels which consume intensive labour acquiring high quality labels. In such precondition, deep Unsupervised Salient Object Detection (USOD) draws public attention. Under the framework of the existing deep USOD methods, they mostly generate pseudo labels by fusing several hand-crafted detectors’ results. On top of that, a Fully Convolutional Network (FCN) will be trained to detect salient regions separately. While the existing USOD methods have achieved some progress, there are still challenges for them towards satisfactory performance on the complex scene, including (1) poor object wholeness owing to neglecting the hierarchy of those salient regions; (2) unsatisfactory pseudo labels causing by unprimitive fusion of hand-crafted results. To address these issues, in this paper, we introduce the property of part-whole relations endowed by a Belief Capsule Network (BCNet) for deep USOD, which is achieved by a multi-stream capsule routing strategy with a belief score for each stream within the CapsNets architecture. To train BCNet well, we generate high-quality pseudo labels from multiple hand-crafted detectors by developing a consistency-aware fusion strategy. Concretely, a weeding out criterion is first defined to filter out unreliable training samples based on the inter-method consistency among four hand-crafted saliency maps. In the following, a dynamic fusion mechanism is designed to generate high-quality pseudo labels from the remaining samples for BCNet training. Experiments on five public datasets illustrate the superiority of the proposed method. Codes have been released on: https://github.com/Mirlongue/Deep-Unsupervised-Part-Whole-Relational-Visual-Saliency.

KW - Consistency-aware fusion strategy

KW - Part-object relationship

KW - Unsupervised salient object detection

UR - http://www.scopus.com/inward/record.url?scp=85174830134&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2023.126916

DO - 10.1016/j.neucom.2023.126916

M3 - 文章

AN - SCOPUS:85174830134

SN - 0925-2312

VL - 563

JO - Neurocomputing

JF - Neurocomputing

M1 - 126916

ER -

Deep unsupervised part-whole relational visual saliency

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this