Fusion-Embedding Siamese Network for Light Field Salient Object Detection

Geng Chen; Huazhu Fu; Tao Zhou; Guobao Xiao; Keren Fu; Yong Xia; Yanning Zhang

doi:10.1109/TMM.2023.3274933

Fusion-Embedding Siamese Network for Light Field Salient Object Detection

Geng Chen, Huazhu Fu, Tao Zhou, Guobao Xiao, Keren Fu, Yong Xia, Yanning Zhang

School of Computer Science

Research output: Contribution to journal › Article › peer-review

15 Scopus citations

Abstract

Light field salient object detection (SOD) has shown remarkable success and gained considerable attention from the computer vision community. Existing methods usually employ a single-/two-stream network to detect saliency. However, these methods can only handle up to two different modalities at a time, preventing them from being able to fully explore the rich information in multi-modal light field derived data. To address this, we propose the first joint multi-modal learning framework, called FES-Net, for light field SOD, which can take rich inputs not limited to two modalities. Specifically, we propose an attention-aware adaptation module to first transform the multi-modal inputs for use in our joint learning framework. The transformed inputs are then fed to a Siamese network along with multiple embedded feature fusion modules to extract informative multi-modal features. Finally, we predict saliency maps from the high-level extracted features using a saliency decoder module. Our joint multi-modal learning framework effectively resolves the limitations of existing methods, providing efficient and effective multi-modal learning that can fully explore the valuable information in light field data for accurate saliency detection. Furthermore, we improve the performance by introducing the Transformer as our backbone network. To the best of our knowledge, the improved version of our model, called FES-Trans, is the first attempt to address the challenging light field SOD with the powerful Transformer technique. Extensive experiments on benchmark datasets demonstrate that our models are superior light field SOD approaches and outperform cutting-edge models remarkably.

Original language	English
Pages (from-to)	984-994
Number of pages	11
Journal	IEEE Transactions on Multimedia
Volume	26
DOIs	https://doi.org/10.1109/TMM.2023.3274933
State	Published - 2024

Keywords

Light field
multi-modal learning
salient object detection
siamese network
transformer

Access to Document

10.1109/TMM.2023.3274933

Cite this

@article{bc265a6606da41ab99bf1d0d45a3a34c,

title = "Fusion-Embedding Siamese Network for Light Field Salient Object Detection",

abstract = "Light field salient object detection (SOD) has shown remarkable success and gained considerable attention from the computer vision community. Existing methods usually employ a single-/two-stream network to detect saliency. However, these methods can only handle up to two different modalities at a time, preventing them from being able to fully explore the rich information in multi-modal light field derived data. To address this, we propose the first joint multi-modal learning framework, called FES-Net, for light field SOD, which can take rich inputs not limited to two modalities. Specifically, we propose an attention-aware adaptation module to first transform the multi-modal inputs for use in our joint learning framework. The transformed inputs are then fed to a Siamese network along with multiple embedded feature fusion modules to extract informative multi-modal features. Finally, we predict saliency maps from the high-level extracted features using a saliency decoder module. Our joint multi-modal learning framework effectively resolves the limitations of existing methods, providing efficient and effective multi-modal learning that can fully explore the valuable information in light field data for accurate saliency detection. Furthermore, we improve the performance by introducing the Transformer as our backbone network. To the best of our knowledge, the improved version of our model, called FES-Trans, is the first attempt to address the challenging light field SOD with the powerful Transformer technique. Extensive experiments on benchmark datasets demonstrate that our models are superior light field SOD approaches and outperform cutting-edge models remarkably.",

keywords = "Light field, multi-modal learning, salient object detection, siamese network, transformer",

author = "Geng Chen and Huazhu Fu and Tao Zhou and Guobao Xiao and Keren Fu and Yong Xia and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2024",

doi = "10.1109/TMM.2023.3274933",

language = "英语",

volume = "26",

pages = "984--994",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Fusion-Embedding Siamese Network for Light Field Salient Object Detection

AU - Chen, Geng

AU - Fu, Huazhu

AU - Zhou, Tao

AU - Xiao, Guobao

AU - Fu, Keren

AU - Xia, Yong

AU - Zhang, Yanning

PY - 2024

Y1 - 2024

N2 - Light field salient object detection (SOD) has shown remarkable success and gained considerable attention from the computer vision community. Existing methods usually employ a single-/two-stream network to detect saliency. However, these methods can only handle up to two different modalities at a time, preventing them from being able to fully explore the rich information in multi-modal light field derived data. To address this, we propose the first joint multi-modal learning framework, called FES-Net, for light field SOD, which can take rich inputs not limited to two modalities. Specifically, we propose an attention-aware adaptation module to first transform the multi-modal inputs for use in our joint learning framework. The transformed inputs are then fed to a Siamese network along with multiple embedded feature fusion modules to extract informative multi-modal features. Finally, we predict saliency maps from the high-level extracted features using a saliency decoder module. Our joint multi-modal learning framework effectively resolves the limitations of existing methods, providing efficient and effective multi-modal learning that can fully explore the valuable information in light field data for accurate saliency detection. Furthermore, we improve the performance by introducing the Transformer as our backbone network. To the best of our knowledge, the improved version of our model, called FES-Trans, is the first attempt to address the challenging light field SOD with the powerful Transformer technique. Extensive experiments on benchmark datasets demonstrate that our models are superior light field SOD approaches and outperform cutting-edge models remarkably.

AB - Light field salient object detection (SOD) has shown remarkable success and gained considerable attention from the computer vision community. Existing methods usually employ a single-/two-stream network to detect saliency. However, these methods can only handle up to two different modalities at a time, preventing them from being able to fully explore the rich information in multi-modal light field derived data. To address this, we propose the first joint multi-modal learning framework, called FES-Net, for light field SOD, which can take rich inputs not limited to two modalities. Specifically, we propose an attention-aware adaptation module to first transform the multi-modal inputs for use in our joint learning framework. The transformed inputs are then fed to a Siamese network along with multiple embedded feature fusion modules to extract informative multi-modal features. Finally, we predict saliency maps from the high-level extracted features using a saliency decoder module. Our joint multi-modal learning framework effectively resolves the limitations of existing methods, providing efficient and effective multi-modal learning that can fully explore the valuable information in light field data for accurate saliency detection. Furthermore, we improve the performance by introducing the Transformer as our backbone network. To the best of our knowledge, the improved version of our model, called FES-Trans, is the first attempt to address the challenging light field SOD with the powerful Transformer technique. Extensive experiments on benchmark datasets demonstrate that our models are superior light field SOD approaches and outperform cutting-edge models remarkably.

KW - Light field

KW - multi-modal learning

KW - salient object detection

KW - siamese network

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85159839633&partnerID=8YFLogxK

U2 - 10.1109/TMM.2023.3274933

DO - 10.1109/TMM.2023.3274933

M3 - 文章

AN - SCOPUS:85159839633

SN - 1520-9210

VL - 26

SP - 984

EP - 994

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Fusion-Embedding Siamese Network for Light Field Salient Object Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this