TY - JOUR
T1 - Bidirectional Guided Attention Network for 3-D Semantic Detection of Remote Sensing Images
AU - Rao, Zhibo
AU - He, Mingyi
AU - Zhu, Zhidong
AU - Dai, Yuchao
AU - He, Renjie
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2021/7
Y1 - 2021/7
N2 - Semantic segmentation and disparity estimation are in the research frontier of the computer vision and remote sensing (RS) fields. However, existing methods mostly deal with these two problems separately or use a combination of multiple models to solve these two tasks. Due to a lack of sufficient information sharing and fusion, they still have difficulties in coping with seasonal appearance differences in 3-D RS problems. In this article, we propose a novel multitask learning architecture that considers the bottom-up and up-bottom visual attention mechanism for 3-D semantic detection, named bidirectional guided attention network (BGA-Net). BGA-Net consists of five modules: unified backbone module (UBM), bidirectional guided attention module (BGAM), semantic segmentation module (SSM), feature matching module (FMM), and bidirectional fusion module (BFM). First, in UBM, we use a shared backbone to extract unified features and share them with three branches/modules (BGAM, SSM, and FMM). Then, SSM and FMM branches are applied to estimate segmentation and disparity maps, whereas the third branch/module (BGAM) shares the global features to guide the task-specific learning via attention mechanism. Finally, we fuse the results of the two tasks by BFM to improve the final performance. Extensive experiments demonstrate that: 1) our BGA-Net can handle the two tasks simultaneously and can be trained in an end-to-end way; 2) these modules fully take advantage of the two tasks' information to share features and enhance the scene understanding ability, effectively against seasons change of RS images; and 3) BGA-Net has notable superiority and greater flexibility and also sets a new state of the art on the urban semantic 3-D (US3D) benchmark. Moreover, BGA-Net also provides insights into the intelligent interpretation of RS data images.
AB - Semantic segmentation and disparity estimation are in the research frontier of the computer vision and remote sensing (RS) fields. However, existing methods mostly deal with these two problems separately or use a combination of multiple models to solve these two tasks. Due to a lack of sufficient information sharing and fusion, they still have difficulties in coping with seasonal appearance differences in 3-D RS problems. In this article, we propose a novel multitask learning architecture that considers the bottom-up and up-bottom visual attention mechanism for 3-D semantic detection, named bidirectional guided attention network (BGA-Net). BGA-Net consists of five modules: unified backbone module (UBM), bidirectional guided attention module (BGAM), semantic segmentation module (SSM), feature matching module (FMM), and bidirectional fusion module (BFM). First, in UBM, we use a shared backbone to extract unified features and share them with three branches/modules (BGAM, SSM, and FMM). Then, SSM and FMM branches are applied to estimate segmentation and disparity maps, whereas the third branch/module (BGAM) shares the global features to guide the task-specific learning via attention mechanism. Finally, we fuse the results of the two tasks by BFM to improve the final performance. Extensive experiments demonstrate that: 1) our BGA-Net can handle the two tasks simultaneously and can be trained in an end-to-end way; 2) these modules fully take advantage of the two tasks' information to share features and enhance the scene understanding ability, effectively against seasons change of RS images; and 3) BGA-Net has notable superiority and greater flexibility and also sets a new state of the art on the urban semantic 3-D (US3D) benchmark. Moreover, BGA-Net also provides insights into the intelligent interpretation of RS data images.
KW - Bidirectional guided aggregation
KW - Remote sensing image
KW - Semantic segmentation
KW - Stereo matching
KW - Visual attention mechanism
UR - http://www.scopus.com/inward/record.url?scp=85112333220&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2020.3029527
DO - 10.1109/TGRS.2020.3029527
M3 - 文章
AN - SCOPUS:85112333220
SN - 0196-2892
VL - 59
SP - 6138
EP - 6153
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
IS - 7
M1 - 9235481
ER -