TY - JOUR
T1 - EM-Trans
T2 - Edge-Aware Multimodal Transformer for RGB-D Salient Object Detection
AU - Chen, Geng
AU - Wang, Qingyue
AU - Dong, Bo
AU - Ma, Ruitao
AU - Liu, Nian
AU - Fu, Huazhu
AU - Xia, Yong
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2025
Y1 - 2025
N2 - RGB-D salient object detection (SOD) has gained tremendous attention in recent years. In particular, transformer has been employed and shown great potential. However, existing transformer models usually overlook the vital edge information, which is a major issue restricting the further improvement of SOD accuracy. To this end, we propose a novel edge-aware RGB-D SOD transformer, called EM-Trans, which explicitly models the edge information in a dual-band decomposition framework. Specifically, we employ two parallel decoder networks to learn the high-frequency edge and low-frequency body features from the low- and high-level features extracted from a two-steam multimodal backbone network, respectively. Next, we propose a cross-attention complementarity exploration module to enrich the edge/body features by exploiting the multimodal complementarity information. The refined features are then fed into our proposed color-hint guided fusion module for enhancing the depth feature and fusing the multimodal features. Finally, the resulting features are fused using our deeply supervised progressive fusion module, which progressively integrates edge and body features for predicting saliency maps. Our model explicitly considers the edge information for accurate RGB-D SOD, overcoming the limitations of existing methods and effectively improving the performance. Extensive experiments on benchmark datasets demonstrate that EM-Trans is an effective RGB-D SOD framework that outperforms the current state-of-the-art models, both quantitatively and qualitatively. A further extension to RGB-T SOD demonstrates the promising potential of our model in various kinds of multimodal SOD tasks.
AB - RGB-D salient object detection (SOD) has gained tremendous attention in recent years. In particular, transformer has been employed and shown great potential. However, existing transformer models usually overlook the vital edge information, which is a major issue restricting the further improvement of SOD accuracy. To this end, we propose a novel edge-aware RGB-D SOD transformer, called EM-Trans, which explicitly models the edge information in a dual-band decomposition framework. Specifically, we employ two parallel decoder networks to learn the high-frequency edge and low-frequency body features from the low- and high-level features extracted from a two-steam multimodal backbone network, respectively. Next, we propose a cross-attention complementarity exploration module to enrich the edge/body features by exploiting the multimodal complementarity information. The refined features are then fed into our proposed color-hint guided fusion module for enhancing the depth feature and fusing the multimodal features. Finally, the resulting features are fused using our deeply supervised progressive fusion module, which progressively integrates edge and body features for predicting saliency maps. Our model explicitly considers the edge information for accurate RGB-D SOD, overcoming the limitations of existing methods and effectively improving the performance. Extensive experiments on benchmark datasets demonstrate that EM-Trans is an effective RGB-D SOD framework that outperforms the current state-of-the-art models, both quantitatively and qualitatively. A further extension to RGB-T SOD demonstrates the promising potential of our model in various kinds of multimodal SOD tasks.
KW - Edge-aware model
KW - multimodal learning
KW - salient object detection (SOD)
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85187260239&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2024.3358858
DO - 10.1109/TNNLS.2024.3358858
M3 - 文章
AN - SCOPUS:85187260239
SN - 2162-237X
VL - 36
SP - 3175
EP - 3188
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 2
ER -