Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection

Nian Liu; Ni Zhang; Ling Shao; Junwei Han

doi:10.1109/TPAMI.2021.3122139

Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection

Nian Liu, Ni Zhang, Ling Shao, Junwei Han

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

81 引用（Scopus）

摘要

How to effectively fuse cross-modal information is a key problem for RGB-D salient object detection. Early fusion and result fusion schemes fuse RGB and depth information at the input and output stages, respectively, and hence incur distribution gaps or information loss. Many models instead employ a feature fusion strategy, but they are limited by their use of low-order point-to-point fusion methods. In this paper, we propose a novel mutual attention model by fusing attention and context from different modalities. We use the non-local attention of one modality to propagate long-range contextual dependencies for the other, thus leveraging complementary attention cues to achieve high-order and trilinear cross-modal interaction. We also propose to induce contrast inference from the mutual attention and obtain a unified model. Considering that low-quality depth data may be detrimental to model performance, we further propose a selective attention to reweight the added depth cues. We embed the proposed modules in a two-stream CNN for RGB-D SOD. Experimental results demonstrate the effectiveness of our proposed model. Moreover, we also construct a new and challenging large-scale RGB-D SOD dataset of high-quality, which can promote both the training and evaluation of deep models.

源语言	英语
页（从-至）	9026-9042
页数	17
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
卷	44
期	12
DOI	https://doi.org/10.1109/TPAMI.2021.3122139
出版状态	已出版 - 1 12月 2022

访问文件

10.1109/TPAMI.2021.3122139

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{6744fe7196cb4a25bd192c9447ed4399,

title = "Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection",

abstract = "How to effectively fuse cross-modal information is a key problem for RGB-D salient object detection. Early fusion and result fusion schemes fuse RGB and depth information at the input and output stages, respectively, and hence incur distribution gaps or information loss. Many models instead employ a feature fusion strategy, but they are limited by their use of low-order point-to-point fusion methods. In this paper, we propose a novel mutual attention model by fusing attention and context from different modalities. We use the non-local attention of one modality to propagate long-range contextual dependencies for the other, thus leveraging complementary attention cues to achieve high-order and trilinear cross-modal interaction. We also propose to induce contrast inference from the mutual attention and obtain a unified model. Considering that low-quality depth data may be detrimental to model performance, we further propose a selective attention to reweight the added depth cues. We embed the proposed modules in a two-stream CNN for RGB-D SOD. Experimental results demonstrate the effectiveness of our proposed model. Moreover, we also construct a new and challenging large-scale RGB-D SOD dataset of high-quality, which can promote both the training and evaluation of deep models.",

keywords = "RGB-D image, Salient object detection, attention model, contrast, non-local network",

author = "Nian Liu and Ni Zhang and Ling Shao and Junwei Han",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2022",

month = dec,

day = "1",

doi = "10.1109/TPAMI.2021.3122139",

language = "英语",

volume = "44",

pages = "9026--9042",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "12",

}

TY - JOUR

T1 - Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection

AU - Liu, Nian

AU - Zhang, Ni

AU - Shao, Ling

AU - Han, Junwei

PY - 2022/12/1

Y1 - 2022/12/1

N2 - How to effectively fuse cross-modal information is a key problem for RGB-D salient object detection. Early fusion and result fusion schemes fuse RGB and depth information at the input and output stages, respectively, and hence incur distribution gaps or information loss. Many models instead employ a feature fusion strategy, but they are limited by their use of low-order point-to-point fusion methods. In this paper, we propose a novel mutual attention model by fusing attention and context from different modalities. We use the non-local attention of one modality to propagate long-range contextual dependencies for the other, thus leveraging complementary attention cues to achieve high-order and trilinear cross-modal interaction. We also propose to induce contrast inference from the mutual attention and obtain a unified model. Considering that low-quality depth data may be detrimental to model performance, we further propose a selective attention to reweight the added depth cues. We embed the proposed modules in a two-stream CNN for RGB-D SOD. Experimental results demonstrate the effectiveness of our proposed model. Moreover, we also construct a new and challenging large-scale RGB-D SOD dataset of high-quality, which can promote both the training and evaluation of deep models.

AB - How to effectively fuse cross-modal information is a key problem for RGB-D salient object detection. Early fusion and result fusion schemes fuse RGB and depth information at the input and output stages, respectively, and hence incur distribution gaps or information loss. Many models instead employ a feature fusion strategy, but they are limited by their use of low-order point-to-point fusion methods. In this paper, we propose a novel mutual attention model by fusing attention and context from different modalities. We use the non-local attention of one modality to propagate long-range contextual dependencies for the other, thus leveraging complementary attention cues to achieve high-order and trilinear cross-modal interaction. We also propose to induce contrast inference from the mutual attention and obtain a unified model. Considering that low-quality depth data may be detrimental to model performance, we further propose a selective attention to reweight the added depth cues. We embed the proposed modules in a two-stream CNN for RGB-D SOD. Experimental results demonstrate the effectiveness of our proposed model. Moreover, we also construct a new and challenging large-scale RGB-D SOD dataset of high-quality, which can promote both the training and evaluation of deep models.

KW - RGB-D image

KW - Salient object detection

KW - attention model

KW - contrast

KW - non-local network

UR - http://www.scopus.com/inward/record.url?scp=85118576791&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2021.3122139

DO - 10.1109/TPAMI.2021.3122139

M3 - 文章

C2 - 34699348

AN - SCOPUS:85118576791

SN - 0162-8828

VL - 44

SP - 9026

EP - 9042

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 12

ER -

Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection

摘要

访问文件

其它文件与链接

指纹

引用此