Abstract
Many encouraging investigations have already been conducted on RGB-D salient object detection (SOD). However, most of these methods are limited in mining single-modal features and have not fully utilized the appropriate complementarity of cross-modal features. To alleviate the issues, this study designs a potential region attention network (PRANet) for RGB-D SOD. Specifically, the PRANet adopts Swin Transformer as its backbone to efficiently obtain two-stream features. Besides, a potential multi-scale attention module (PMAM) is equipped at the highest level of the encoder, which is beneficial for mining intra-modal information and enhancing feature expression. More importantly, a potential region attention module (PRAM) is designed to properly utilize the complementarity of cross-modal information, which adopts a potential region attention to guide two-stream feature fusion. In addition, by refining and correcting cross-layer features, a feature refinement fusion module (FRFM) is designed to strengthen the cross-layer information transmission between the encoder and decoder. Finally, the multi-side supervision is used during the training phase. Sufficient experimental results on 6 RGB-D SOD datasets indicate that our PRANet has achieved outstanding performance and is superior to 15 representative methods.
Original language | English |
---|---|
Article number | 107620 |
Journal | Neural Networks |
Volume | 190 |
DOIs | |
State | Published - Oct 2025 |
Keywords
- Multi-scale attention
- Region attention
- RGB-D salient object detection
- Transformer