Rethinking feature aggregation for deep RGB-D salient object detection

Yuan fang Zhang; Jiangbin Zheng; Long Li; Nian Liu; Wenjing Jia; Xiaochen Fan; Chengpei Xu; Xiangjian He

doi:10.1016/j.neucom.2020.10.079

Rethinking feature aggregation for deep RGB-D salient object detection

Yuan fang Zhang, Jiangbin Zheng, Long Li, Nian Liu, Wenjing Jia, Xiaochen Fan, Chengpei Xu, Xiangjian He

School of Software

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.

Original language	English
Pages (from-to)	463-473
Number of pages	11
Journal	Neurocomputing
Volume	423
DOIs	https://doi.org/10.1016/j.neucom.2020.10.079
State	Published - 29 Jan 2021

Keywords

Feature aggregation
Gated attention
RGB-D saliency detection
UNet

Access to Document

10.1016/j.neucom.2020.10.079

Cite this

@article{914de7dcfe374be186d1835afc24bd7a,

title = "Rethinking feature aggregation for deep RGB-D salient object detection",

abstract = "Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.",

keywords = "Feature aggregation, Gated attention, RGB-D saliency detection, UNet",

author = "Zhang, {Yuan fang} and Jiangbin Zheng and Long Li and Nian Liu and Wenjing Jia and Xiaochen Fan and Chengpei Xu and Xiangjian He",

note = "Publisher Copyright: {\textcopyright} 2020",

year = "2021",

month = jan,

day = "29",

doi = "10.1016/j.neucom.2020.10.079",

language = "英语",

volume = "423",

pages = "463--473",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Rethinking feature aggregation for deep RGB-D salient object detection

AU - Zhang, Yuan fang

AU - Zheng, Jiangbin

AU - Li, Long

AU - Liu, Nian

AU - Jia, Wenjing

AU - Fan, Xiaochen

AU - Xu, Chengpei

AU - He, Xiangjian

PY - 2021/1/29

Y1 - 2021/1/29

N2 - Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.

AB - Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.

KW - Feature aggregation

KW - Gated attention

KW - RGB-D saliency detection

KW - UNet

UR - http://www.scopus.com/inward/record.url?scp=85097067969&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.10.079

DO - 10.1016/j.neucom.2020.10.079

M3 - 文章

AN - SCOPUS:85097067969

SN - 0925-2312

VL - 423

SP - 463

EP - 473

JO - Neurocomputing

JF - Neurocomputing

ER -

Rethinking feature aggregation for deep RGB-D salient object detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this