TY - JOUR
T1 - MA-Stereo
T2 - Real-Time Stereo Matching via Multi-Scale Attention Fusion and Spatial Error-Aware Refinement
AU - Gao, Wei
AU - Cai, Yongjie
AU - Akoudad, Youssef
AU - Yang, Yang
AU - Chen, Jie
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Stereo matching is a fundamental task in computer vision. Real-time stereo matching has recently shown great potential in robotics and autonomous driving applications. However, the existing cost aggregation in real-time stereo matching suffers from accuracy limitations in ill-posed regions. Furthermore, most real-time stereo matching methods struggle to predict disparity in object details and edge areas, resulting in relatively blurred and lacking detailed disparity maps. To address these issues, we propose a real-time stereo matching architecture called MA-Stereo, which features a multi-scale attention fusion (MAF) module and an attention-based spatial error-aware refinement (ASER) module. The MAF adaptively fuses context and geometry information through attention mechanism, effectively improving cost aggregation. In addition, the ASER refines the predicted disparity map, fully leveraging high-frequency information and spatial evidence to accurately predict disparities for sharp edges and thin structures. Experimental results on the SceneFlow and KITTI benchmarks demonstrate that MA-Stereo outperforms almost all current state-of-the-art real-time stereo matching methods while maintaining relatively low runtime, achieving a favorable trade-off between accuracy and speed.
AB - Stereo matching is a fundamental task in computer vision. Real-time stereo matching has recently shown great potential in robotics and autonomous driving applications. However, the existing cost aggregation in real-time stereo matching suffers from accuracy limitations in ill-posed regions. Furthermore, most real-time stereo matching methods struggle to predict disparity in object details and edge areas, resulting in relatively blurred and lacking detailed disparity maps. To address these issues, we propose a real-time stereo matching architecture called MA-Stereo, which features a multi-scale attention fusion (MAF) module and an attention-based spatial error-aware refinement (ASER) module. The MAF adaptively fuses context and geometry information through attention mechanism, effectively improving cost aggregation. In addition, the ASER refines the predicted disparity map, fully leveraging high-frequency information and spatial evidence to accurately predict disparities for sharp edges and thin structures. Experimental results on the SceneFlow and KITTI benchmarks demonstrate that MA-Stereo outperforms almost all current state-of-the-art real-time stereo matching methods while maintaining relatively low runtime, achieving a favorable trade-off between accuracy and speed.
KW - autonomous vehicle navigation
KW - Deep learning for visual perception
KW - real-time stereo matching
UR - http://www.scopus.com/inward/record.url?scp=85205429118&partnerID=8YFLogxK
U2 - 10.1109/LRA.2024.3468173
DO - 10.1109/LRA.2024.3468173
M3 - 文章
AN - SCOPUS:85205429118
SN - 2377-3766
VL - 9
SP - 9954
EP - 9961
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 11
ER -