Transcending Pixels: Boosting Saliency Detection via Scene Understanding From Aerial Imagery

Yanfeng Liu; Zhitong Xiong; Yuan Yuan; Qi Wang

doi:10.1109/TGRS.2023.3298661

Transcending Pixels: Boosting Saliency Detection via Scene Understanding From Aerial Imagery

Yanfeng Liu, Zhitong Xiong, Yuan Yuan, Qi Wang

School of Artificial Intelligence, OPtics and Electronics

Research output: Contribution to journal › Article › peer-review

55 Scopus citations

Abstract

Existing remote sensing image salient object detection (RSI-SOD) methods widely perform object-level semantic understanding with pixel-level supervision, but ignore the image-level scene information. As a fundamental attribute of remote sensing images (RSIs), the scene has a complex intrinsic correlation with salient objects, which may bring hints to improve saliency detection performance. However, existing RSI-SOD datasets lack both pixel- and image-level labels, and it is non-trivial to effectively transfer the scene domain knowledge for more accurate saliency localization. To address these challenges, we first annotate the image-level scene labels of three RSI-SOD datasets inspired by remote sensing scene classification. On top of it, we present a novel scene-guided dual-branch network (SDNet), which can perform cross-task knowledge distillation from the scene classification to facilitate accurate saliency detection. Specifically, a scene knowledge transfer module (SKTM) and a conditional dynamic guidance module (CDGM) are designed for extracting saliency key area as spatial attention from the scene subnet and guiding the saliency subnet to generate scene-enhanced saliency features, respectively. Finally, an object contour awareness module (OCAM) is introduced to enable the model to focus more on irregular spatial details of salient objects from the complicated background. Extensive experiments reveal that our SDNet outperforms over 20 state-of-the-art algorithms on three datasets. Moreover, we prove that the proposed framework is model-agnostic, and its extension to six baselines can bring significant performance benefits. Code is available at https://github.com/lyf0801/SDNet.

Original language	English
Article number	5616416
Journal	IEEE Transactions on Geoscience and Remote Sensing
Volume	61
DOIs	https://doi.org/10.1109/TGRS.2023.3298661
State	Published - 2023

Keywords

Conditional guidance learning
dynamic class activation map (CAM)
optical remote sensing image (RSI)
salient object detection (SOD)
scene knowledge distillation

Access to Document

10.1109/TGRS.2023.3298661

Cite this

@article{2f3c328b5b064215b61a9a784d088379,

title = "Transcending Pixels: Boosting Saliency Detection via Scene Understanding From Aerial Imagery",

abstract = "Existing remote sensing image salient object detection (RSI-SOD) methods widely perform object-level semantic understanding with pixel-level supervision, but ignore the image-level scene information. As a fundamental attribute of remote sensing images (RSIs), the scene has a complex intrinsic correlation with salient objects, which may bring hints to improve saliency detection performance. However, existing RSI-SOD datasets lack both pixel- and image-level labels, and it is non-trivial to effectively transfer the scene domain knowledge for more accurate saliency localization. To address these challenges, we first annotate the image-level scene labels of three RSI-SOD datasets inspired by remote sensing scene classification. On top of it, we present a novel scene-guided dual-branch network (SDNet), which can perform cross-task knowledge distillation from the scene classification to facilitate accurate saliency detection. Specifically, a scene knowledge transfer module (SKTM) and a conditional dynamic guidance module (CDGM) are designed for extracting saliency key area as spatial attention from the scene subnet and guiding the saliency subnet to generate scene-enhanced saliency features, respectively. Finally, an object contour awareness module (OCAM) is introduced to enable the model to focus more on irregular spatial details of salient objects from the complicated background. Extensive experiments reveal that our SDNet outperforms over 20 state-of-the-art algorithms on three datasets. Moreover, we prove that the proposed framework is model-agnostic, and its extension to six baselines can bring significant performance benefits. Code is available at https://github.com/lyf0801/SDNet.",

keywords = "Conditional guidance learning, dynamic class activation map (CAM), optical remote sensing image (RSI), salient object detection (SOD), scene knowledge distillation",

author = "Yanfeng Liu and Zhitong Xiong and Yuan Yuan and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2023",

doi = "10.1109/TGRS.2023.3298661",

language = "英语",

volume = "61",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Transcending Pixels

T2 - Boosting Saliency Detection via Scene Understanding From Aerial Imagery

AU - Liu, Yanfeng

AU - Xiong, Zhitong

AU - Yuan, Yuan

AU - Wang, Qi

PY - 2023

Y1 - 2023

N2 - Existing remote sensing image salient object detection (RSI-SOD) methods widely perform object-level semantic understanding with pixel-level supervision, but ignore the image-level scene information. As a fundamental attribute of remote sensing images (RSIs), the scene has a complex intrinsic correlation with salient objects, which may bring hints to improve saliency detection performance. However, existing RSI-SOD datasets lack both pixel- and image-level labels, and it is non-trivial to effectively transfer the scene domain knowledge for more accurate saliency localization. To address these challenges, we first annotate the image-level scene labels of three RSI-SOD datasets inspired by remote sensing scene classification. On top of it, we present a novel scene-guided dual-branch network (SDNet), which can perform cross-task knowledge distillation from the scene classification to facilitate accurate saliency detection. Specifically, a scene knowledge transfer module (SKTM) and a conditional dynamic guidance module (CDGM) are designed for extracting saliency key area as spatial attention from the scene subnet and guiding the saliency subnet to generate scene-enhanced saliency features, respectively. Finally, an object contour awareness module (OCAM) is introduced to enable the model to focus more on irregular spatial details of salient objects from the complicated background. Extensive experiments reveal that our SDNet outperforms over 20 state-of-the-art algorithms on three datasets. Moreover, we prove that the proposed framework is model-agnostic, and its extension to six baselines can bring significant performance benefits. Code is available at https://github.com/lyf0801/SDNet.

AB - Existing remote sensing image salient object detection (RSI-SOD) methods widely perform object-level semantic understanding with pixel-level supervision, but ignore the image-level scene information. As a fundamental attribute of remote sensing images (RSIs), the scene has a complex intrinsic correlation with salient objects, which may bring hints to improve saliency detection performance. However, existing RSI-SOD datasets lack both pixel- and image-level labels, and it is non-trivial to effectively transfer the scene domain knowledge for more accurate saliency localization. To address these challenges, we first annotate the image-level scene labels of three RSI-SOD datasets inspired by remote sensing scene classification. On top of it, we present a novel scene-guided dual-branch network (SDNet), which can perform cross-task knowledge distillation from the scene classification to facilitate accurate saliency detection. Specifically, a scene knowledge transfer module (SKTM) and a conditional dynamic guidance module (CDGM) are designed for extracting saliency key area as spatial attention from the scene subnet and guiding the saliency subnet to generate scene-enhanced saliency features, respectively. Finally, an object contour awareness module (OCAM) is introduced to enable the model to focus more on irregular spatial details of salient objects from the complicated background. Extensive experiments reveal that our SDNet outperforms over 20 state-of-the-art algorithms on three datasets. Moreover, we prove that the proposed framework is model-agnostic, and its extension to six baselines can bring significant performance benefits. Code is available at https://github.com/lyf0801/SDNet.

KW - Conditional guidance learning

KW - dynamic class activation map (CAM)

KW - optical remote sensing image (RSI)

KW - salient object detection (SOD)

KW - scene knowledge distillation

UR - http://www.scopus.com/inward/record.url?scp=85165916203&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2023.3298661

DO - 10.1109/TGRS.2023.3298661

M3 - 文章

AN - SCOPUS:85165916203

SN - 0196-2892

VL - 61

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5616416

ER -

Transcending Pixels: Boosting Saliency Detection via Scene Understanding From Aerial Imagery

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this