MA-Stereo: Real-Time Stereo Matching via Multi-Scale Attention Fusion and Spatial Error-Aware Refinement

Wei Gao; Yongjie Cai; Youssef Akoudad; Yang Yang; Jie Chen

doi:10.1109/LRA.2024.3468173

MA-Stereo: Real-Time Stereo Matching via Multi-Scale Attention Fusion and Spatial Error-Aware Refinement

Wei Gao, Yongjie Cai, Youssef Akoudad, Yang Yang, Jie Chen

航海学院

Jiangsu University

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Stereo matching is a fundamental task in computer vision. Real-time stereo matching has recently shown great potential in robotics and autonomous driving applications. However, the existing cost aggregation in real-time stereo matching suffers from accuracy limitations in ill-posed regions. Furthermore, most real-time stereo matching methods struggle to predict disparity in object details and edge areas, resulting in relatively blurred and lacking detailed disparity maps. To address these issues, we propose a real-time stereo matching architecture called MA-Stereo, which features a multi-scale attention fusion (MAF) module and an attention-based spatial error-aware refinement (ASER) module. The MAF adaptively fuses context and geometry information through attention mechanism, effectively improving cost aggregation. In addition, the ASER refines the predicted disparity map, fully leveraging high-frequency information and spatial evidence to accurately predict disparities for sharp edges and thin structures. Experimental results on the SceneFlow and KITTI benchmarks demonstrate that MA-Stereo outperforms almost all current state-of-the-art real-time stereo matching methods while maintaining relatively low runtime, achieving a favorable trade-off between accuracy and speed.

源语言	英语
页（从-至）	9954-9961
页数	8
期刊	IEEE Robotics and Automation Letters
卷	9
期	11
DOI	https://doi.org/10.1109/LRA.2024.3468173
出版状态	已出版 - 2024

访问文件

10.1109/LRA.2024.3468173

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{ed037686d0ff420a88747558dc6eefec,

title = "MA-Stereo: Real-Time Stereo Matching via Multi-Scale Attention Fusion and Spatial Error-Aware Refinement",

abstract = "Stereo matching is a fundamental task in computer vision. Real-time stereo matching has recently shown great potential in robotics and autonomous driving applications. However, the existing cost aggregation in real-time stereo matching suffers from accuracy limitations in ill-posed regions. Furthermore, most real-time stereo matching methods struggle to predict disparity in object details and edge areas, resulting in relatively blurred and lacking detailed disparity maps. To address these issues, we propose a real-time stereo matching architecture called MA-Stereo, which features a multi-scale attention fusion (MAF) module and an attention-based spatial error-aware refinement (ASER) module. The MAF adaptively fuses context and geometry information through attention mechanism, effectively improving cost aggregation. In addition, the ASER refines the predicted disparity map, fully leveraging high-frequency information and spatial evidence to accurately predict disparities for sharp edges and thin structures. Experimental results on the SceneFlow and KITTI benchmarks demonstrate that MA-Stereo outperforms almost all current state-of-the-art real-time stereo matching methods while maintaining relatively low runtime, achieving a favorable trade-off between accuracy and speed.",

keywords = "autonomous vehicle navigation, Deep learning for visual perception, real-time stereo matching",

author = "Wei Gao and Yongjie Cai and Youssef Akoudad and Yang Yang and Jie Chen",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.",

year = "2024",

doi = "10.1109/LRA.2024.3468173",

language = "英语",

volume = "9",

pages = "9954--9961",

journal = "IEEE Robotics and Automation Letters",

issn = "2377-3766",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "11",

}

TY - JOUR

T1 - MA-Stereo

T2 - Real-Time Stereo Matching via Multi-Scale Attention Fusion and Spatial Error-Aware Refinement

AU - Gao, Wei

AU - Cai, Yongjie

AU - Akoudad, Youssef

AU - Yang, Yang

AU - Chen, Jie

PY - 2024

Y1 - 2024

N2 - Stereo matching is a fundamental task in computer vision. Real-time stereo matching has recently shown great potential in robotics and autonomous driving applications. However, the existing cost aggregation in real-time stereo matching suffers from accuracy limitations in ill-posed regions. Furthermore, most real-time stereo matching methods struggle to predict disparity in object details and edge areas, resulting in relatively blurred and lacking detailed disparity maps. To address these issues, we propose a real-time stereo matching architecture called MA-Stereo, which features a multi-scale attention fusion (MAF) module and an attention-based spatial error-aware refinement (ASER) module. The MAF adaptively fuses context and geometry information through attention mechanism, effectively improving cost aggregation. In addition, the ASER refines the predicted disparity map, fully leveraging high-frequency information and spatial evidence to accurately predict disparities for sharp edges and thin structures. Experimental results on the SceneFlow and KITTI benchmarks demonstrate that MA-Stereo outperforms almost all current state-of-the-art real-time stereo matching methods while maintaining relatively low runtime, achieving a favorable trade-off between accuracy and speed.

AB - Stereo matching is a fundamental task in computer vision. Real-time stereo matching has recently shown great potential in robotics and autonomous driving applications. However, the existing cost aggregation in real-time stereo matching suffers from accuracy limitations in ill-posed regions. Furthermore, most real-time stereo matching methods struggle to predict disparity in object details and edge areas, resulting in relatively blurred and lacking detailed disparity maps. To address these issues, we propose a real-time stereo matching architecture called MA-Stereo, which features a multi-scale attention fusion (MAF) module and an attention-based spatial error-aware refinement (ASER) module. The MAF adaptively fuses context and geometry information through attention mechanism, effectively improving cost aggregation. In addition, the ASER refines the predicted disparity map, fully leveraging high-frequency information and spatial evidence to accurately predict disparities for sharp edges and thin structures. Experimental results on the SceneFlow and KITTI benchmarks demonstrate that MA-Stereo outperforms almost all current state-of-the-art real-time stereo matching methods while maintaining relatively low runtime, achieving a favorable trade-off between accuracy and speed.

KW - autonomous vehicle navigation

KW - Deep learning for visual perception

KW - real-time stereo matching

UR - http://www.scopus.com/inward/record.url?scp=85205429118&partnerID=8YFLogxK

U2 - 10.1109/LRA.2024.3468173

DO - 10.1109/LRA.2024.3468173

M3 - 文章

AN - SCOPUS:85205429118

SN - 2377-3766

VL - 9

SP - 9954

EP - 9961

JO - IEEE Robotics and Automation Letters

JF - IEEE Robotics and Automation Letters

IS - 11

ER -

MA-Stereo: Real-Time Stereo Matching via Multi-Scale Attention Fusion and Spatial Error-Aware Refinement

摘要

访问文件

其它文件与链接

指纹

引用此