TY - JOUR
T1 - Context-Aware 3D Object Detection From a Single Image in Autonomous Driving
AU - Zhou, Dingfu
AU - Song, Xibin
AU - Fang, Jin
AU - Dai, Yuchao
AU - Li, Hongdong
AU - Zhang, Liangjun
N1 - Publisher Copyright:
© 2000-2011 IEEE.
PY - 2022/10/1
Y1 - 2022/10/1
N2 - Camera sensors have been widely used in Driver-Assistance and Autonomous Driving Systems due to their rich texture information. Recently, with the development of deep learning techniques, many approaches have been proposed to detect objects in 3D from a single frame, however, there is still much room for improvement. In this paper, we generally review the recently proposed state-of-the-art monocular-based 3D object detection approaches first. Based on the analysis of the disadvantage of previous center-based frameworks, a novel feature aggregation strategy has been proposed to boost the 3D object detection by exploring the context information. Specifically, an Instance-Guided Spatial Attention (IGSA) module is proposed to collect the local instance information and the Channel-Wise Feature Attention (CWFA) module is employed for aggregating the global context information. In addition, an instance-guided object regression strategy is also proposed to alleviate the influence of center location prediction uncertainty in the inference process. Finally, the proposed approach has been verified on the public 3D object detection benchmark. The experimental results show that the proposed approach can significantly boost the performance of the baseline method on both 3D detection and 2D Bird's-Eye View among all three categories. Furthermore, our method outperforms all the monocular-based methods (even these trained with depth as auxiliary inputs) and achieves state-of-the-art performance on the KITTI benchmark.
AB - Camera sensors have been widely used in Driver-Assistance and Autonomous Driving Systems due to their rich texture information. Recently, with the development of deep learning techniques, many approaches have been proposed to detect objects in 3D from a single frame, however, there is still much room for improvement. In this paper, we generally review the recently proposed state-of-the-art monocular-based 3D object detection approaches first. Based on the analysis of the disadvantage of previous center-based frameworks, a novel feature aggregation strategy has been proposed to boost the 3D object detection by exploring the context information. Specifically, an Instance-Guided Spatial Attention (IGSA) module is proposed to collect the local instance information and the Channel-Wise Feature Attention (CWFA) module is employed for aggregating the global context information. In addition, an instance-guided object regression strategy is also proposed to alleviate the influence of center location prediction uncertainty in the inference process. Finally, the proposed approach has been verified on the public 3D object detection benchmark. The experimental results show that the proposed approach can significantly boost the performance of the baseline method on both 3D detection and 2D Bird's-Eye View among all three categories. Furthermore, our method outperforms all the monocular-based methods (even these trained with depth as auxiliary inputs) and achieves state-of-the-art performance on the KITTI benchmark.
KW - context-aware feature aggregation
KW - Monocular 3D object detection
KW - self-attention
UR - http://www.scopus.com/inward/record.url?scp=85126308446&partnerID=8YFLogxK
U2 - 10.1109/TITS.2022.3154022
DO - 10.1109/TITS.2022.3154022
M3 - 文章
AN - SCOPUS:85126308446
SN - 1524-9050
VL - 23
SP - 18568
EP - 18580
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
IS - 10
ER -