Sliding space-disparity transformer for stereo matching

Zhibo Rao; Mingyi He; Yuchao Dai; Zhelun Shen

doi:10.1007/s00521-022-07621-7

Sliding space-disparity transformer for stereo matching

Zhibo Rao, Mingyi He, Yuchao Dai, Zhelun Shen

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

Transformers have achieved impressive performance in natural language processing and computer vision, including text translation, semantic segmentation, etc. However, due to excessive self-attention computation and memory occupation, the stereo matching task does not share its success. To promote this technology in stereo matching, especially with limited hardware resources, we propose a sliding space-disparity transformer named SSD-former. According to matching modeling, we simplify transformer for achieving faster speed, memory-friendly, and competitive performance. First, we employ the sliding window scheme to limit the self-attention operations in the cost volume for adapting to different resolutions, bringing efficiency and flexibility. Second, our space-disparity transformer remarkably reduces memory occupation and computation, only computing the current patch’s self-attention with two parts: (1) all patches of current disparity level at the whole spatial location and (2) the patches of different disparity levels at the exact spatial location. The experiments demonstrate that: (1) different from the standard transformer, SSD-former is faster and memory-friendly; (2) compared with 3D convolution methods, SSD-former has a larger receptive field and provides an impressive speed, showing great potential in stereo matching; and (3) our model obtains state-of-the-art performance and a faster speed on the multiple popular datasets, achieving the best speed–accuracy trade-off.

Original language	English
Pages (from-to)	21863-21876
Number of pages	14
Journal	Neural Computing and Applications
Volume	34
Issue number	24
DOIs	https://doi.org/10.1007/s00521-022-07621-7
State	Published - Dec 2022

Keywords

Sliding windows
Space-disparity attention mechanism
Stereo matching
Transformer

Access to Document

10.1007/s00521-022-07621-7

Cite this

@article{27d29d39f1ac4b019f0e6edc70c5b407,

title = "Sliding space-disparity transformer for stereo matching",

abstract = "Transformers have achieved impressive performance in natural language processing and computer vision, including text translation, semantic segmentation, etc. However, due to excessive self-attention computation and memory occupation, the stereo matching task does not share its success. To promote this technology in stereo matching, especially with limited hardware resources, we propose a sliding space-disparity transformer named SSD-former. According to matching modeling, we simplify transformer for achieving faster speed, memory-friendly, and competitive performance. First, we employ the sliding window scheme to limit the self-attention operations in the cost volume for adapting to different resolutions, bringing efficiency and flexibility. Second, our space-disparity transformer remarkably reduces memory occupation and computation, only computing the current patch{\textquoteright}s self-attention with two parts: (1) all patches of current disparity level at the whole spatial location and (2) the patches of different disparity levels at the exact spatial location. The experiments demonstrate that: (1) different from the standard transformer, SSD-former is faster and memory-friendly; (2) compared with 3D convolution methods, SSD-former has a larger receptive field and provides an impressive speed, showing great potential in stereo matching; and (3) our model obtains state-of-the-art performance and a faster speed on the multiple popular datasets, achieving the best speed–accuracy trade-off.",

keywords = "Sliding windows, Space-disparity attention mechanism, Stereo matching, Transformer",

author = "Zhibo Rao and Mingyi He and Yuchao Dai and Zhelun Shen",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.",

year = "2022",

month = dec,

doi = "10.1007/s00521-022-07621-7",

language = "英语",

volume = "34",

pages = "21863--21876",

journal = "Neural Computing and Applications",

issn = "0941-0643",

publisher = "Springer London",

number = "24",

}

TY - JOUR

T1 - Sliding space-disparity transformer for stereo matching

AU - Rao, Zhibo

AU - He, Mingyi

AU - Dai, Yuchao

AU - Shen, Zhelun

PY - 2022/12

Y1 - 2022/12

N2 - Transformers have achieved impressive performance in natural language processing and computer vision, including text translation, semantic segmentation, etc. However, due to excessive self-attention computation and memory occupation, the stereo matching task does not share its success. To promote this technology in stereo matching, especially with limited hardware resources, we propose a sliding space-disparity transformer named SSD-former. According to matching modeling, we simplify transformer for achieving faster speed, memory-friendly, and competitive performance. First, we employ the sliding window scheme to limit the self-attention operations in the cost volume for adapting to different resolutions, bringing efficiency and flexibility. Second, our space-disparity transformer remarkably reduces memory occupation and computation, only computing the current patch’s self-attention with two parts: (1) all patches of current disparity level at the whole spatial location and (2) the patches of different disparity levels at the exact spatial location. The experiments demonstrate that: (1) different from the standard transformer, SSD-former is faster and memory-friendly; (2) compared with 3D convolution methods, SSD-former has a larger receptive field and provides an impressive speed, showing great potential in stereo matching; and (3) our model obtains state-of-the-art performance and a faster speed on the multiple popular datasets, achieving the best speed–accuracy trade-off.

AB - Transformers have achieved impressive performance in natural language processing and computer vision, including text translation, semantic segmentation, etc. However, due to excessive self-attention computation and memory occupation, the stereo matching task does not share its success. To promote this technology in stereo matching, especially with limited hardware resources, we propose a sliding space-disparity transformer named SSD-former. According to matching modeling, we simplify transformer for achieving faster speed, memory-friendly, and competitive performance. First, we employ the sliding window scheme to limit the self-attention operations in the cost volume for adapting to different resolutions, bringing efficiency and flexibility. Second, our space-disparity transformer remarkably reduces memory occupation and computation, only computing the current patch’s self-attention with two parts: (1) all patches of current disparity level at the whole spatial location and (2) the patches of different disparity levels at the exact spatial location. The experiments demonstrate that: (1) different from the standard transformer, SSD-former is faster and memory-friendly; (2) compared with 3D convolution methods, SSD-former has a larger receptive field and provides an impressive speed, showing great potential in stereo matching; and (3) our model obtains state-of-the-art performance and a faster speed on the multiple popular datasets, achieving the best speed–accuracy trade-off.

KW - Sliding windows

KW - Space-disparity attention mechanism

KW - Stereo matching

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=85136830635&partnerID=8YFLogxK

U2 - 10.1007/s00521-022-07621-7

DO - 10.1007/s00521-022-07621-7

M3 - 文章

AN - SCOPUS:85136830635

SN - 0941-0643

VL - 34

SP - 21863

EP - 21876

JO - Neural Computing and Applications

JF - Neural Computing and Applications

IS - 24

ER -

Sliding space-disparity transformer for stereo matching

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this