Two-and three-dimensional deep human detection by generating orthographic top view image from dense point cloud

Fang Tan; Xiaoyi Feng; Yupeng Ma; Zhaoqiang Xia

doi:10.1117/1.JEI.31.3.033009

Two-and three-dimensional deep human detection by generating orthographic top view image from dense point cloud

Fang Tan, Xiaoyi Feng, Yupeng Ma, Zhaoqiang Xia

电子信息学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Human detection still suffers from occlusion, complex backgrounds, and scale-variant problems. Projecting three-dimensional (3D) points onto the ground to generate an orthographic top view (OTV) image for detection can effectively alleviate these problems. However, depth sensors may be placed arbitrarily, making it difficult to create OTV images by the dense point cloud converted from a depth image. We focus on the generation of OTV images and human detection via the constructed OTV image. First, we propose a ground plane extraction method that is well suitable for various camera positions and orientations in complex scenes. Next, points are converted to a uniform coordinate system by ground parameters and encoded to generate a three-channel OTV image. Then, the mainstream two-dimensional (2D) network is employed to detect the human directly on OTV images and further obtain the 3D bounding box by computing the mapping from the OTV image. Besides, we propose a semiautomated annotation method to solve the problem of few OTV image annotations. The proposed method is evaluated on the EPFL dataset, including two subsets, and achieves state-of-the-art performance compared with the existing approaches. Moreover, our 2D and 3D human detection method can run more than 26FPS on the CPU.

源语言	英语
文章编号	033009
期刊	Journal of Electronic Imaging
卷	31
期	3
DOI	https://doi.org/10.1117/1.JEI.31.3.033009
出版状态	已出版 - 1 5月 2022

访问文件

10.1117/1.JEI.31.3.033009

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{86b8e9f0d3eb46ada82e7d9fbe2da832,

title = "Two-and three-dimensional deep human detection by generating orthographic top view image from dense point cloud",

abstract = "Human detection still suffers from occlusion, complex backgrounds, and scale-variant problems. Projecting three-dimensional (3D) points onto the ground to generate an orthographic top view (OTV) image for detection can effectively alleviate these problems. However, depth sensors may be placed arbitrarily, making it difficult to create OTV images by the dense point cloud converted from a depth image. We focus on the generation of OTV images and human detection via the constructed OTV image. First, we propose a ground plane extraction method that is well suitable for various camera positions and orientations in complex scenes. Next, points are converted to a uniform coordinate system by ground parameters and encoded to generate a three-channel OTV image. Then, the mainstream two-dimensional (2D) network is employed to detect the human directly on OTV images and further obtain the 3D bounding box by computing the mapping from the OTV image. Besides, we propose a semiautomated annotation method to solve the problem of few OTV image annotations. The proposed method is evaluated on the EPFL dataset, including two subsets, and achieves state-of-the-art performance compared with the existing approaches. Moreover, our 2D and 3D human detection method can run more than 26FPS on the CPU.",

keywords = "Dense point clouds, Depth Image, Ground plane extraction, Orthographic top view, Three-dimensional human detection",

author = "Fang Tan and Xiaoyi Feng and Yupeng Ma and Zhaoqiang Xia",

note = "Publisher Copyright: {\textcopyright} 2022 SPIE and IS&T.",

year = "2022",

month = may,

day = "1",

doi = "10.1117/1.JEI.31.3.033009",

language = "英语",

volume = "31",

journal = "Journal of Electronic Imaging",

issn = "1017-9909",

publisher = "SPIE",

number = "3",

}

TY - JOUR

T1 - Two-and three-dimensional deep human detection by generating orthographic top view image from dense point cloud

AU - Tan, Fang

AU - Feng, Xiaoyi

AU - Ma, Yupeng

AU - Xia, Zhaoqiang

PY - 2022/5/1

Y1 - 2022/5/1

N2 - Human detection still suffers from occlusion, complex backgrounds, and scale-variant problems. Projecting three-dimensional (3D) points onto the ground to generate an orthographic top view (OTV) image for detection can effectively alleviate these problems. However, depth sensors may be placed arbitrarily, making it difficult to create OTV images by the dense point cloud converted from a depth image. We focus on the generation of OTV images and human detection via the constructed OTV image. First, we propose a ground plane extraction method that is well suitable for various camera positions and orientations in complex scenes. Next, points are converted to a uniform coordinate system by ground parameters and encoded to generate a three-channel OTV image. Then, the mainstream two-dimensional (2D) network is employed to detect the human directly on OTV images and further obtain the 3D bounding box by computing the mapping from the OTV image. Besides, we propose a semiautomated annotation method to solve the problem of few OTV image annotations. The proposed method is evaluated on the EPFL dataset, including two subsets, and achieves state-of-the-art performance compared with the existing approaches. Moreover, our 2D and 3D human detection method can run more than 26FPS on the CPU.

AB - Human detection still suffers from occlusion, complex backgrounds, and scale-variant problems. Projecting three-dimensional (3D) points onto the ground to generate an orthographic top view (OTV) image for detection can effectively alleviate these problems. However, depth sensors may be placed arbitrarily, making it difficult to create OTV images by the dense point cloud converted from a depth image. We focus on the generation of OTV images and human detection via the constructed OTV image. First, we propose a ground plane extraction method that is well suitable for various camera positions and orientations in complex scenes. Next, points are converted to a uniform coordinate system by ground parameters and encoded to generate a three-channel OTV image. Then, the mainstream two-dimensional (2D) network is employed to detect the human directly on OTV images and further obtain the 3D bounding box by computing the mapping from the OTV image. Besides, we propose a semiautomated annotation method to solve the problem of few OTV image annotations. The proposed method is evaluated on the EPFL dataset, including two subsets, and achieves state-of-the-art performance compared with the existing approaches. Moreover, our 2D and 3D human detection method can run more than 26FPS on the CPU.

KW - Dense point clouds

KW - Depth Image

KW - Ground plane extraction

KW - Orthographic top view

KW - Three-dimensional human detection

UR - http://www.scopus.com/inward/record.url?scp=85133705100&partnerID=8YFLogxK

U2 - 10.1117/1.JEI.31.3.033009

DO - 10.1117/1.JEI.31.3.033009

M3 - 文章

AN - SCOPUS:85133705100

SN - 1017-9909

VL - 31

JO - Journal of Electronic Imaging

JF - Journal of Electronic Imaging

IS - 3

M1 - 033009

ER -

Two-and three-dimensional deep human detection by generating orthographic top view image from dense point cloud

摘要

访问文件

其它文件与链接

指纹

引用此