TY - JOUR
T1 - Sliding space-disparity transformer for stereo matching
AU - Rao, Zhibo
AU - He, Mingyi
AU - Dai, Yuchao
AU - Shen, Zhelun
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
PY - 2022/12
Y1 - 2022/12
N2 - Transformers have achieved impressive performance in natural language processing and computer vision, including text translation, semantic segmentation, etc. However, due to excessive self-attention computation and memory occupation, the stereo matching task does not share its success. To promote this technology in stereo matching, especially with limited hardware resources, we propose a sliding space-disparity transformer named SSD-former. According to matching modeling, we simplify transformer for achieving faster speed, memory-friendly, and competitive performance. First, we employ the sliding window scheme to limit the self-attention operations in the cost volume for adapting to different resolutions, bringing efficiency and flexibility. Second, our space-disparity transformer remarkably reduces memory occupation and computation, only computing the current patch’s self-attention with two parts: (1) all patches of current disparity level at the whole spatial location and (2) the patches of different disparity levels at the exact spatial location. The experiments demonstrate that: (1) different from the standard transformer, SSD-former is faster and memory-friendly; (2) compared with 3D convolution methods, SSD-former has a larger receptive field and provides an impressive speed, showing great potential in stereo matching; and (3) our model obtains state-of-the-art performance and a faster speed on the multiple popular datasets, achieving the best speed–accuracy trade-off.
AB - Transformers have achieved impressive performance in natural language processing and computer vision, including text translation, semantic segmentation, etc. However, due to excessive self-attention computation and memory occupation, the stereo matching task does not share its success. To promote this technology in stereo matching, especially with limited hardware resources, we propose a sliding space-disparity transformer named SSD-former. According to matching modeling, we simplify transformer for achieving faster speed, memory-friendly, and competitive performance. First, we employ the sliding window scheme to limit the self-attention operations in the cost volume for adapting to different resolutions, bringing efficiency and flexibility. Second, our space-disparity transformer remarkably reduces memory occupation and computation, only computing the current patch’s self-attention with two parts: (1) all patches of current disparity level at the whole spatial location and (2) the patches of different disparity levels at the exact spatial location. The experiments demonstrate that: (1) different from the standard transformer, SSD-former is faster and memory-friendly; (2) compared with 3D convolution methods, SSD-former has a larger receptive field and provides an impressive speed, showing great potential in stereo matching; and (3) our model obtains state-of-the-art performance and a faster speed on the multiple popular datasets, achieving the best speed–accuracy trade-off.
KW - Sliding windows
KW - Space-disparity attention mechanism
KW - Stereo matching
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85136830635&partnerID=8YFLogxK
U2 - 10.1007/s00521-022-07621-7
DO - 10.1007/s00521-022-07621-7
M3 - 文章
AN - SCOPUS:85136830635
SN - 0941-0643
VL - 34
SP - 21863
EP - 21876
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 24
ER -