Potential region attention network for RGB-D salient object detection

Dawei Song; Yuan Yuan; Xuelong Li

doi:10.1016/j.neunet.2025.107620

Potential region attention network for RGB-D salient object detection

Dawei Song, Yuan Yuan, Xuelong Li

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Many encouraging investigations have already been conducted on RGB-D salient object detection (SOD). However, most of these methods are limited in mining single-modal features and have not fully utilized the appropriate complementarity of cross-modal features. To alleviate the issues, this study designs a potential region attention network (PRANet) for RGB-D SOD. Specifically, the PRANet adopts Swin Transformer as its backbone to efficiently obtain two-stream features. Besides, a potential multi-scale attention module (PMAM) is equipped at the highest level of the encoder, which is beneficial for mining intra-modal information and enhancing feature expression. More importantly, a potential region attention module (PRAM) is designed to properly utilize the complementarity of cross-modal information, which adopts a potential region attention to guide two-stream feature fusion. In addition, by refining and correcting cross-layer features, a feature refinement fusion module (FRFM) is designed to strengthen the cross-layer information transmission between the encoder and decoder. Finally, the multi-side supervision is used during the training phase. Sufficient experimental results on 6 RGB-D SOD datasets indicate that our PRANet has achieved outstanding performance and is superior to 15 representative methods.

源语言	英语
文章编号	107620
期刊	Neural Networks
卷	190
DOI	https://doi.org/10.1016/j.neunet.2025.107620
出版状态	已出版 - 10月 2025

访问文件

10.1016/j.neunet.2025.107620

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{12345cc7415e4b209449b339cc2d6cb0,

title = "Potential region attention network for RGB-D salient object detection",

abstract = "Many encouraging investigations have already been conducted on RGB-D salient object detection (SOD). However, most of these methods are limited in mining single-modal features and have not fully utilized the appropriate complementarity of cross-modal features. To alleviate the issues, this study designs a potential region attention network (PRANet) for RGB-D SOD. Specifically, the PRANet adopts Swin Transformer as its backbone to efficiently obtain two-stream features. Besides, a potential multi-scale attention module (PMAM) is equipped at the highest level of the encoder, which is beneficial for mining intra-modal information and enhancing feature expression. More importantly, a potential region attention module (PRAM) is designed to properly utilize the complementarity of cross-modal information, which adopts a potential region attention to guide two-stream feature fusion. In addition, by refining and correcting cross-layer features, a feature refinement fusion module (FRFM) is designed to strengthen the cross-layer information transmission between the encoder and decoder. Finally, the multi-side supervision is used during the training phase. Sufficient experimental results on 6 RGB-D SOD datasets indicate that our PRANet has achieved outstanding performance and is superior to 15 representative methods.",

keywords = "Multi-scale attention, Region attention, RGB-D salient object detection, Transformer",

author = "Dawei Song and Yuan Yuan and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier Ltd",

year = "2025",

month = oct,

doi = "10.1016/j.neunet.2025.107620",

language = "英语",

volume = "190",

journal = "Neural Networks",

issn = "0893-6080",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Potential region attention network for RGB-D salient object detection

AU - Song, Dawei

AU - Yuan, Yuan

AU - Li, Xuelong

PY - 2025/10

Y1 - 2025/10

N2 - Many encouraging investigations have already been conducted on RGB-D salient object detection (SOD). However, most of these methods are limited in mining single-modal features and have not fully utilized the appropriate complementarity of cross-modal features. To alleviate the issues, this study designs a potential region attention network (PRANet) for RGB-D SOD. Specifically, the PRANet adopts Swin Transformer as its backbone to efficiently obtain two-stream features. Besides, a potential multi-scale attention module (PMAM) is equipped at the highest level of the encoder, which is beneficial for mining intra-modal information and enhancing feature expression. More importantly, a potential region attention module (PRAM) is designed to properly utilize the complementarity of cross-modal information, which adopts a potential region attention to guide two-stream feature fusion. In addition, by refining and correcting cross-layer features, a feature refinement fusion module (FRFM) is designed to strengthen the cross-layer information transmission between the encoder and decoder. Finally, the multi-side supervision is used during the training phase. Sufficient experimental results on 6 RGB-D SOD datasets indicate that our PRANet has achieved outstanding performance and is superior to 15 representative methods.

AB - Many encouraging investigations have already been conducted on RGB-D salient object detection (SOD). However, most of these methods are limited in mining single-modal features and have not fully utilized the appropriate complementarity of cross-modal features. To alleviate the issues, this study designs a potential region attention network (PRANet) for RGB-D SOD. Specifically, the PRANet adopts Swin Transformer as its backbone to efficiently obtain two-stream features. Besides, a potential multi-scale attention module (PMAM) is equipped at the highest level of the encoder, which is beneficial for mining intra-modal information and enhancing feature expression. More importantly, a potential region attention module (PRAM) is designed to properly utilize the complementarity of cross-modal information, which adopts a potential region attention to guide two-stream feature fusion. In addition, by refining and correcting cross-layer features, a feature refinement fusion module (FRFM) is designed to strengthen the cross-layer information transmission between the encoder and decoder. Finally, the multi-side supervision is used during the training phase. Sufficient experimental results on 6 RGB-D SOD datasets indicate that our PRANet has achieved outstanding performance and is superior to 15 representative methods.

KW - Multi-scale attention

KW - Region attention

KW - RGB-D salient object detection

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=105006771622&partnerID=8YFLogxK

U2 - 10.1016/j.neunet.2025.107620

DO - 10.1016/j.neunet.2025.107620

M3 - 文章

AN - SCOPUS:105006771622

SN - 0893-6080

VL - 190

JO - Neural Networks

JF - Neural Networks

M1 - 107620

ER -

Potential region attention network for RGB-D salient object detection

摘要

访问文件

其它文件与链接

指纹

引用此