Potential region attention network for RGB-D salient object detection

Dawei Song, Yuan Yuan, Xuelong Li

Research output: Contribution to journalArticlepeer-review

Abstract

Many encouraging investigations have already been conducted on RGB-D salient object detection (SOD). However, most of these methods are limited in mining single-modal features and have not fully utilized the appropriate complementarity of cross-modal features. To alleviate the issues, this study designs a potential region attention network (PRANet) for RGB-D SOD. Specifically, the PRANet adopts Swin Transformer as its backbone to efficiently obtain two-stream features. Besides, a potential multi-scale attention module (PMAM) is equipped at the highest level of the encoder, which is beneficial for mining intra-modal information and enhancing feature expression. More importantly, a potential region attention module (PRAM) is designed to properly utilize the complementarity of cross-modal information, which adopts a potential region attention to guide two-stream feature fusion. In addition, by refining and correcting cross-layer features, a feature refinement fusion module (FRFM) is designed to strengthen the cross-layer information transmission between the encoder and decoder. Finally, the multi-side supervision is used during the training phase. Sufficient experimental results on 6 RGB-D SOD datasets indicate that our PRANet has achieved outstanding performance and is superior to 15 representative methods.

Original languageEnglish
Article number107620
JournalNeural Networks
Volume190
DOIs
StatePublished - Oct 2025

Keywords

  • Multi-scale attention
  • Region attention
  • RGB-D salient object detection
  • Transformer

Fingerprint

Dive into the research topics of 'Potential region attention network for RGB-D salient object detection'. Together they form a unique fingerprint.

Cite this