TY - JOUR
T1 - Two-and three-dimensional deep human detection by generating orthographic top view image from dense point cloud
AU - Tan, Fang
AU - Feng, Xiaoyi
AU - Ma, Yupeng
AU - Xia, Zhaoqiang
N1 - Publisher Copyright:
© 2022 SPIE and IS&T.
PY - 2022/5/1
Y1 - 2022/5/1
N2 - Human detection still suffers from occlusion, complex backgrounds, and scale-variant problems. Projecting three-dimensional (3D) points onto the ground to generate an orthographic top view (OTV) image for detection can effectively alleviate these problems. However, depth sensors may be placed arbitrarily, making it difficult to create OTV images by the dense point cloud converted from a depth image. We focus on the generation of OTV images and human detection via the constructed OTV image. First, we propose a ground plane extraction method that is well suitable for various camera positions and orientations in complex scenes. Next, points are converted to a uniform coordinate system by ground parameters and encoded to generate a three-channel OTV image. Then, the mainstream two-dimensional (2D) network is employed to detect the human directly on OTV images and further obtain the 3D bounding box by computing the mapping from the OTV image. Besides, we propose a semiautomated annotation method to solve the problem of few OTV image annotations. The proposed method is evaluated on the EPFL dataset, including two subsets, and achieves state-of-the-art performance compared with the existing approaches. Moreover, our 2D and 3D human detection method can run more than 26FPS on the CPU.
AB - Human detection still suffers from occlusion, complex backgrounds, and scale-variant problems. Projecting three-dimensional (3D) points onto the ground to generate an orthographic top view (OTV) image for detection can effectively alleviate these problems. However, depth sensors may be placed arbitrarily, making it difficult to create OTV images by the dense point cloud converted from a depth image. We focus on the generation of OTV images and human detection via the constructed OTV image. First, we propose a ground plane extraction method that is well suitable for various camera positions and orientations in complex scenes. Next, points are converted to a uniform coordinate system by ground parameters and encoded to generate a three-channel OTV image. Then, the mainstream two-dimensional (2D) network is employed to detect the human directly on OTV images and further obtain the 3D bounding box by computing the mapping from the OTV image. Besides, we propose a semiautomated annotation method to solve the problem of few OTV image annotations. The proposed method is evaluated on the EPFL dataset, including two subsets, and achieves state-of-the-art performance compared with the existing approaches. Moreover, our 2D and 3D human detection method can run more than 26FPS on the CPU.
KW - Dense point clouds
KW - Depth Image
KW - Ground plane extraction
KW - Orthographic top view
KW - Three-dimensional human detection
UR - http://www.scopus.com/inward/record.url?scp=85133705100&partnerID=8YFLogxK
U2 - 10.1117/1.JEI.31.3.033009
DO - 10.1117/1.JEI.31.3.033009
M3 - 文章
AN - SCOPUS:85133705100
SN - 1017-9909
VL - 31
JO - Journal of Electronic Imaging
JF - Journal of Electronic Imaging
IS - 3
M1 - 033009
ER -