摘要
Many encouraging investigations have already been conducted on RGB-D salient object detection (SOD). However, most of these methods are limited in mining single-modal features and have not fully utilized the appropriate complementarity of cross-modal features. To alleviate the issues, this study designs a potential region attention network (PRANet) for RGB-D SOD. Specifically, the PRANet adopts Swin Transformer as its backbone to efficiently obtain two-stream features. Besides, a potential multi-scale attention module (PMAM) is equipped at the highest level of the encoder, which is beneficial for mining intra-modal information and enhancing feature expression. More importantly, a potential region attention module (PRAM) is designed to properly utilize the complementarity of cross-modal information, which adopts a potential region attention to guide two-stream feature fusion. In addition, by refining and correcting cross-layer features, a feature refinement fusion module (FRFM) is designed to strengthen the cross-layer information transmission between the encoder and decoder. Finally, the multi-side supervision is used during the training phase. Sufficient experimental results on 6 RGB-D SOD datasets indicate that our PRANet has achieved outstanding performance and is superior to 15 representative methods.
源语言 | 英语 |
---|---|
文章编号 | 107620 |
期刊 | Neural Networks |
卷 | 190 |
DOI | |
出版状态 | 已出版 - 10月 2025 |