Deep RGB-D Saliency Detection Without Depth

Yuan Fang Zhang; Jiangbin Zheng; Wenjing Jia; Wenfeng Huang; Long Li; Nian Liu; Fei Li; Xiangjian He

doi:10.1109/TMM.2021.3058788

Deep RGB-D Saliency Detection Without Depth

Yuan Fang Zhang, Jiangbin Zheng, Wenjing Jia, Wenfeng Huang, Long Li, Nian Liu, Fei Li, Xiangjian He

软件学院

科研成果: 期刊稿件 › 文章 › 同行评审

13 引用（Scopus）

摘要

The existing saliency detection models based on RGB colors only leverage appearance cues to detect salient objects. Depth information also plays a very important role in visual saliency detection and can supply complementary cues for saliency detection. Although many RGB-D saliency models have been proposed, they require to acquire depth data, which is expensive and not easy to get. In this paper, we propose to estimate depth information from monocular RGB images and leverage the intermediate depth features to enhance the saliency detection performance in a deep neural network framework. Specifically, we first use an encoder network to extract common features from each RGB image and then build two decoder networks for depth estimation and saliency detection, respectively. The depth decoder features can be fused with the RGB saliency features to enhance their capability. Furthermore, we also propose a novel dense multiscale fusion model to densely fuse multiscale depth and RGB features based on the dense ASPP model. A new global context branch is also added to boost the multiscale features. Experimental results demonstrate that the added depth cues and the proposed fusion model can both improve the saliency detection performance. Finally, our model not only outperforms state-of-the-art RGB saliency models, but also achieves comparable results compared with state-of-the-art RGB-D saliency models.

源语言	英语
页（从-至）	755-767
页数	13
期刊	IEEE Transactions on Multimedia
卷	24
DOI	https://doi.org/10.1109/TMM.2021.3058788
出版状态	已出版 - 2022

访问文件

10.1109/TMM.2021.3058788

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{4ac80de6f2974b2b9bf0fb464eac4e11,

title = "Deep RGB-D Saliency Detection Without Depth",

abstract = "The existing saliency detection models based on RGB colors only leverage appearance cues to detect salient objects. Depth information also plays a very important role in visual saliency detection and can supply complementary cues for saliency detection. Although many RGB-D saliency models have been proposed, they require to acquire depth data, which is expensive and not easy to get. In this paper, we propose to estimate depth information from monocular RGB images and leverage the intermediate depth features to enhance the saliency detection performance in a deep neural network framework. Specifically, we first use an encoder network to extract common features from each RGB image and then build two decoder networks for depth estimation and saliency detection, respectively. The depth decoder features can be fused with the RGB saliency features to enhance their capability. Furthermore, we also propose a novel dense multiscale fusion model to densely fuse multiscale depth and RGB features based on the dense ASPP model. A new global context branch is also added to boost the multiscale features. Experimental results demonstrate that the added depth cues and the proposed fusion model can both improve the saliency detection performance. Finally, our model not only outperforms state-of-the-art RGB saliency models, but also achieves comparable results compared with state-of-the-art RGB-D saliency models.",

keywords = "Convolutional neural network, depth estimation, feature fusion, saliency detection",

author = "Zhang, {Yuan Fang} and Jiangbin Zheng and Wenjing Jia and Wenfeng Huang and Long Li and Nian Liu and Fei Li and Xiangjian He",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2022",

doi = "10.1109/TMM.2021.3058788",

language = "英语",

volume = "24",

pages = "755--767",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Deep RGB-D Saliency Detection Without Depth

AU - Zhang, Yuan Fang

AU - Zheng, Jiangbin

AU - Jia, Wenjing

AU - Huang, Wenfeng

AU - Li, Long

AU - Liu, Nian

AU - Li, Fei

AU - He, Xiangjian

PY - 2022

Y1 - 2022

N2 - The existing saliency detection models based on RGB colors only leverage appearance cues to detect salient objects. Depth information also plays a very important role in visual saliency detection and can supply complementary cues for saliency detection. Although many RGB-D saliency models have been proposed, they require to acquire depth data, which is expensive and not easy to get. In this paper, we propose to estimate depth information from monocular RGB images and leverage the intermediate depth features to enhance the saliency detection performance in a deep neural network framework. Specifically, we first use an encoder network to extract common features from each RGB image and then build two decoder networks for depth estimation and saliency detection, respectively. The depth decoder features can be fused with the RGB saliency features to enhance their capability. Furthermore, we also propose a novel dense multiscale fusion model to densely fuse multiscale depth and RGB features based on the dense ASPP model. A new global context branch is also added to boost the multiscale features. Experimental results demonstrate that the added depth cues and the proposed fusion model can both improve the saliency detection performance. Finally, our model not only outperforms state-of-the-art RGB saliency models, but also achieves comparable results compared with state-of-the-art RGB-D saliency models.

AB - The existing saliency detection models based on RGB colors only leverage appearance cues to detect salient objects. Depth information also plays a very important role in visual saliency detection and can supply complementary cues for saliency detection. Although many RGB-D saliency models have been proposed, they require to acquire depth data, which is expensive and not easy to get. In this paper, we propose to estimate depth information from monocular RGB images and leverage the intermediate depth features to enhance the saliency detection performance in a deep neural network framework. Specifically, we first use an encoder network to extract common features from each RGB image and then build two decoder networks for depth estimation and saliency detection, respectively. The depth decoder features can be fused with the RGB saliency features to enhance their capability. Furthermore, we also propose a novel dense multiscale fusion model to densely fuse multiscale depth and RGB features based on the dense ASPP model. A new global context branch is also added to boost the multiscale features. Experimental results demonstrate that the added depth cues and the proposed fusion model can both improve the saliency detection performance. Finally, our model not only outperforms state-of-the-art RGB saliency models, but also achieves comparable results compared with state-of-the-art RGB-D saliency models.

KW - Convolutional neural network

KW - depth estimation

KW - feature fusion

KW - saliency detection

UR - http://www.scopus.com/inward/record.url?scp=85101745317&partnerID=8YFLogxK

U2 - 10.1109/TMM.2021.3058788

DO - 10.1109/TMM.2021.3058788

M3 - 文章

AN - SCOPUS:85101745317

SN - 1520-9210

VL - 24

SP - 755

EP - 767

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Deep RGB-D Saliency Detection Without Depth

摘要

访问文件

其它文件与链接

指纹

引用此