Context-Aware 3D Object Detection From a Single Image in Autonomous Driving

Dingfu Zhou; Xibin Song; Jin Fang; Yuchao Dai; Hongdong Li; Liangjun Zhang

doi:10.1109/TITS.2022.3154022

Context-Aware 3D Object Detection From a Single Image in Autonomous Driving

Dingfu Zhou, Xibin Song, Jin Fang, Yuchao Dai, Hongdong Li, Liangjun Zhang

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

Camera sensors have been widely used in Driver-Assistance and Autonomous Driving Systems due to their rich texture information. Recently, with the development of deep learning techniques, many approaches have been proposed to detect objects in 3D from a single frame, however, there is still much room for improvement. In this paper, we generally review the recently proposed state-of-the-art monocular-based 3D object detection approaches first. Based on the analysis of the disadvantage of previous center-based frameworks, a novel feature aggregation strategy has been proposed to boost the 3D object detection by exploring the context information. Specifically, an Instance-Guided Spatial Attention (IGSA) module is proposed to collect the local instance information and the Channel-Wise Feature Attention (CWFA) module is employed for aggregating the global context information. In addition, an instance-guided object regression strategy is also proposed to alleviate the influence of center location prediction uncertainty in the inference process. Finally, the proposed approach has been verified on the public 3D object detection benchmark. The experimental results show that the proposed approach can significantly boost the performance of the baseline method on both 3D detection and 2D Bird's-Eye View among all three categories. Furthermore, our method outperforms all the monocular-based methods (even these trained with depth as auxiliary inputs) and achieves state-of-the-art performance on the KITTI benchmark.

Original language	English
Pages (from-to)	18568-18580
Number of pages	13
Journal	IEEE Transactions on Intelligent Transportation Systems
Volume	23
Issue number	10
DOIs	https://doi.org/10.1109/TITS.2022.3154022
State	Published - 1 Oct 2022

Keywords

context-aware feature aggregation
Monocular 3D object detection
self-attention

Access to Document

10.1109/TITS.2022.3154022

Cite this

@article{d919fc56ff1d48c484507b111cbad676,

title = "Context-Aware 3D Object Detection From a Single Image in Autonomous Driving",

abstract = "Camera sensors have been widely used in Driver-Assistance and Autonomous Driving Systems due to their rich texture information. Recently, with the development of deep learning techniques, many approaches have been proposed to detect objects in 3D from a single frame, however, there is still much room for improvement. In this paper, we generally review the recently proposed state-of-the-art monocular-based 3D object detection approaches first. Based on the analysis of the disadvantage of previous center-based frameworks, a novel feature aggregation strategy has been proposed to boost the 3D object detection by exploring the context information. Specifically, an Instance-Guided Spatial Attention (IGSA) module is proposed to collect the local instance information and the Channel-Wise Feature Attention (CWFA) module is employed for aggregating the global context information. In addition, an instance-guided object regression strategy is also proposed to alleviate the influence of center location prediction uncertainty in the inference process. Finally, the proposed approach has been verified on the public 3D object detection benchmark. The experimental results show that the proposed approach can significantly boost the performance of the baseline method on both 3D detection and 2D Bird's-Eye View among all three categories. Furthermore, our method outperforms all the monocular-based methods (even these trained with depth as auxiliary inputs) and achieves state-of-the-art performance on the KITTI benchmark.",

keywords = "context-aware feature aggregation, Monocular 3D object detection, self-attention",

author = "Dingfu Zhou and Xibin Song and Jin Fang and Yuchao Dai and Hongdong Li and Liangjun Zhang",

note = "Publisher Copyright: {\textcopyright} 2000-2011 IEEE.",

year = "2022",

month = oct,

day = "1",

doi = "10.1109/TITS.2022.3154022",

language = "英语",

volume = "23",

pages = "18568--18580",

journal = "IEEE Transactions on Intelligent Transportation Systems",

issn = "1524-9050",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "10",

}

TY - JOUR

T1 - Context-Aware 3D Object Detection From a Single Image in Autonomous Driving

AU - Zhou, Dingfu

AU - Song, Xibin

AU - Fang, Jin

AU - Dai, Yuchao

AU - Li, Hongdong

AU - Zhang, Liangjun

PY - 2022/10/1

Y1 - 2022/10/1

N2 - Camera sensors have been widely used in Driver-Assistance and Autonomous Driving Systems due to their rich texture information. Recently, with the development of deep learning techniques, many approaches have been proposed to detect objects in 3D from a single frame, however, there is still much room for improvement. In this paper, we generally review the recently proposed state-of-the-art monocular-based 3D object detection approaches first. Based on the analysis of the disadvantage of previous center-based frameworks, a novel feature aggregation strategy has been proposed to boost the 3D object detection by exploring the context information. Specifically, an Instance-Guided Spatial Attention (IGSA) module is proposed to collect the local instance information and the Channel-Wise Feature Attention (CWFA) module is employed for aggregating the global context information. In addition, an instance-guided object regression strategy is also proposed to alleviate the influence of center location prediction uncertainty in the inference process. Finally, the proposed approach has been verified on the public 3D object detection benchmark. The experimental results show that the proposed approach can significantly boost the performance of the baseline method on both 3D detection and 2D Bird's-Eye View among all three categories. Furthermore, our method outperforms all the monocular-based methods (even these trained with depth as auxiliary inputs) and achieves state-of-the-art performance on the KITTI benchmark.

AB - Camera sensors have been widely used in Driver-Assistance and Autonomous Driving Systems due to their rich texture information. Recently, with the development of deep learning techniques, many approaches have been proposed to detect objects in 3D from a single frame, however, there is still much room for improvement. In this paper, we generally review the recently proposed state-of-the-art monocular-based 3D object detection approaches first. Based on the analysis of the disadvantage of previous center-based frameworks, a novel feature aggregation strategy has been proposed to boost the 3D object detection by exploring the context information. Specifically, an Instance-Guided Spatial Attention (IGSA) module is proposed to collect the local instance information and the Channel-Wise Feature Attention (CWFA) module is employed for aggregating the global context information. In addition, an instance-guided object regression strategy is also proposed to alleviate the influence of center location prediction uncertainty in the inference process. Finally, the proposed approach has been verified on the public 3D object detection benchmark. The experimental results show that the proposed approach can significantly boost the performance of the baseline method on both 3D detection and 2D Bird's-Eye View among all three categories. Furthermore, our method outperforms all the monocular-based methods (even these trained with depth as auxiliary inputs) and achieves state-of-the-art performance on the KITTI benchmark.

KW - context-aware feature aggregation

KW - Monocular 3D object detection

KW - self-attention

UR - http://www.scopus.com/inward/record.url?scp=85126308446&partnerID=8YFLogxK

U2 - 10.1109/TITS.2022.3154022

DO - 10.1109/TITS.2022.3154022

M3 - 文章

AN - SCOPUS:85126308446

SN - 1524-9050

VL - 23

SP - 18568

EP - 18580

JO - IEEE Transactions on Intelligent Transportation Systems

JF - IEEE Transactions on Intelligent Transportation Systems

IS - 10

ER -

Context-Aware 3D Object Detection From a Single Image in Autonomous Driving

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this