NLCA-Net: A non-local context attention network for stereo matching

Zhibo Rao; Mingyi He; Yuchao Dai; Zhidong Zhu; Bo Li; Renjie He

doi:10.1017/ATSIP.2020.16

NLCA-Net: A non-local context attention network for stereo matching

Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, Bo Li, Renjie He

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

53 Scopus citations

Abstract

Accurate disparity prediction is a hot spot in computer vision, and how to efficiently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry refinement module to refine the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1) our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error and the fraction of erroneous pixels; (2) our proposed method particularly has superior performance in the reflective regions and occluded areas.

Original language	English
Article number	e18
Journal	APSIPA Transactions on Signal and Information Processing
Volume	9
DOIs	https://doi.org/10.1017/ATSIP.2020.16
State	Published - 19 Feb 2020

Keywords

Geometry context
Geometry refine
Non-local attention
Stereo matching

Access to Document

10.1017/ATSIP.2020.16

Cite this

@article{f773eabc9fca4eb4a324c88bebda9242,

title = "NLCA-Net: A non-local context attention network for stereo matching",

abstract = "Accurate disparity prediction is a hot spot in computer vision, and how to efficiently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry refinement module to refine the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1) our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error and the fraction of erroneous pixels; (2) our proposed method particularly has superior performance in the reflective regions and occluded areas.",

keywords = "Geometry context, Geometry refine, Non-local attention, Stereo matching",

author = "Zhibo Rao and Mingyi He and Yuchao Dai and Zhidong Zhu and Bo Li and Renjie He",

note = "Publisher Copyright: {\textcopyright} 2020 The Author(s).",

year = "2020",

month = feb,

day = "19",

doi = "10.1017/ATSIP.2020.16",

language = "英语",

volume = "9",

journal = "APSIPA Transactions on Signal and Information Processing",

issn = "2048-7703",

publisher = "Oxford University Press",

}

TY - JOUR

T1 - NLCA-Net

T2 - A non-local context attention network for stereo matching

AU - Rao, Zhibo

AU - He, Mingyi

AU - Dai, Yuchao

AU - Zhu, Zhidong

AU - Li, Bo

AU - He, Renjie

PY - 2020/2/19

Y1 - 2020/2/19

N2 - Accurate disparity prediction is a hot spot in computer vision, and how to efficiently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry refinement module to refine the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1) our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error and the fraction of erroneous pixels; (2) our proposed method particularly has superior performance in the reflective regions and occluded areas.

AB - Accurate disparity prediction is a hot spot in computer vision, and how to efficiently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry refinement module to refine the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1) our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error and the fraction of erroneous pixels; (2) our proposed method particularly has superior performance in the reflective regions and occluded areas.

KW - Geometry context

KW - Geometry refine

KW - Non-local attention

KW - Stereo matching

UR - http://www.scopus.com/inward/record.url?scp=85090789308&partnerID=8YFLogxK

U2 - 10.1017/ATSIP.2020.16

DO - 10.1017/ATSIP.2020.16

M3 - 文章

AN - SCOPUS:85090789308

SN - 2048-7703

VL - 9

JO - APSIPA Transactions on Signal and Information Processing

JF - APSIPA Transactions on Signal and Information Processing

M1 - e18

ER -

NLCA-Net: A non-local context attention network for stereo matching

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this