Bidirectional Guided Attention Network for 3-D Semantic Detection of Remote Sensing Images

Zhibo Rao, Mingyi He, Zhidong Zhu, Yuchao Dai, Renjie He

Research output: Contribution to journalArticlepeer-review

35 Scopus citations

Abstract

Semantic segmentation and disparity estimation are in the research frontier of the computer vision and remote sensing (RS) fields. However, existing methods mostly deal with these two problems separately or use a combination of multiple models to solve these two tasks. Due to a lack of sufficient information sharing and fusion, they still have difficulties in coping with seasonal appearance differences in 3-D RS problems. In this article, we propose a novel multitask learning architecture that considers the bottom-up and up-bottom visual attention mechanism for 3-D semantic detection, named bidirectional guided attention network (BGA-Net). BGA-Net consists of five modules: unified backbone module (UBM), bidirectional guided attention module (BGAM), semantic segmentation module (SSM), feature matching module (FMM), and bidirectional fusion module (BFM). First, in UBM, we use a shared backbone to extract unified features and share them with three branches/modules (BGAM, SSM, and FMM). Then, SSM and FMM branches are applied to estimate segmentation and disparity maps, whereas the third branch/module (BGAM) shares the global features to guide the task-specific learning via attention mechanism. Finally, we fuse the results of the two tasks by BFM to improve the final performance. Extensive experiments demonstrate that: 1) our BGA-Net can handle the two tasks simultaneously and can be trained in an end-to-end way; 2) these modules fully take advantage of the two tasks' information to share features and enhance the scene understanding ability, effectively against seasons change of RS images; and 3) BGA-Net has notable superiority and greater flexibility and also sets a new state of the art on the urban semantic 3-D (US3D) benchmark. Moreover, BGA-Net also provides insights into the intelligent interpretation of RS data images.

Original languageEnglish
Article number9235481
Pages (from-to)6138-6153
Number of pages16
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume59
Issue number7
DOIs
StatePublished - Jul 2021

Keywords

  • Bidirectional guided aggregation
  • Remote sensing image
  • Semantic segmentation
  • Stereo matching
  • Visual attention mechanism

Fingerprint

Dive into the research topics of 'Bidirectional Guided Attention Network for 3-D Semantic Detection of Remote Sensing Images'. Together they form a unique fingerprint.

Cite this