Rethinking feature aggregation for deep RGB-D salient object detection

Yuan fang Zhang; Jiangbin Zheng; Long Li; Nian Liu; Wenjing Jia; Xiaochen Fan; Chengpei Xu; Xiangjian He

doi:10.1016/j.neucom.2020.10.079

Rethinking feature aggregation for deep RGB-D salient object detection

Yuan fang Zhang, Jiangbin Zheng, Long Li, Nian Liu, Wenjing Jia, Xiaochen Fan, Chengpei Xu, Xiangjian He

软件学院

科研成果: 期刊稿件 › 文章 › 同行评审

11 引用（Scopus）

摘要

Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.

源语言	英语
页（从-至）	463-473
页数	11
期刊	Neurocomputing
卷	423
DOI	https://doi.org/10.1016/j.neucom.2020.10.079
出版状态	已出版 - 29 1月 2021

访问文件

10.1016/j.neucom.2020.10.079

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{914de7dcfe374be186d1835afc24bd7a,

title = "Rethinking feature aggregation for deep RGB-D salient object detection",

abstract = "Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.",

keywords = "Feature aggregation, Gated attention, RGB-D saliency detection, UNet",

author = "Zhang, {Yuan fang} and Jiangbin Zheng and Long Li and Nian Liu and Wenjing Jia and Xiaochen Fan and Chengpei Xu and Xiangjian He",

note = "Publisher Copyright: {\textcopyright} 2020",

year = "2021",

month = jan,

day = "29",

doi = "10.1016/j.neucom.2020.10.079",

language = "英语",

volume = "423",

pages = "463--473",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Rethinking feature aggregation for deep RGB-D salient object detection

AU - Zhang, Yuan fang

AU - Zheng, Jiangbin

AU - Li, Long

AU - Liu, Nian

AU - Jia, Wenjing

AU - Fan, Xiaochen

AU - Xu, Chengpei

AU - He, Xiangjian

PY - 2021/1/29

Y1 - 2021/1/29

N2 - Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.

AB - Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.

KW - Feature aggregation

KW - Gated attention

KW - RGB-D saliency detection

KW - UNet

UR - http://www.scopus.com/inward/record.url?scp=85097067969&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.10.079

DO - 10.1016/j.neucom.2020.10.079

M3 - 文章

AN - SCOPUS:85097067969

SN - 0925-2312

VL - 423

SP - 463

EP - 473

JO - Neurocomputing

JF - Neurocomputing

ER -

Rethinking feature aggregation for deep RGB-D salient object detection

摘要

访问文件

其它文件与链接

指纹

引用此