PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos

Dingwen Zhang; Guangyu Guo; Dong Huang; Junwei Han

doi:10.1109/CVPR.2018.00707

PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos

Dingwen Zhang, Guangyu Guo, Dong Huang, Junwei Han

School of Automation

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

34 Scopus citations

Abstract

Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This 'noisy' motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: Pose estimation and action recognition.

Original language	English
Title of host publication	Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
Publisher	IEEE Computer Society
Pages	6762-6770
Number of pages	9
ISBN (Electronic)	9781538664209
DOIs	https://doi.org/10.1109/CVPR.2018.00707
State	Published - 14 Dec 2018
Event	31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 - Salt Lake City, United States Duration: 18 Jun 2018 → 22 Jun 2018

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919

Conference

Conference	31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
Country/Territory	United States
City	Salt Lake City
Period	18/06/18 → 22/06/18

Access to Document

10.1109/CVPR.2018.00707

Cite this

Zhang, D., Guo, G., Huang, D., & Han, J. (2018). PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos. In Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 (pp. 6762-6770). Article 8578805 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR.2018.00707

Zhang, Dingwen ; Guo, Guangyu ; Huang, Dong et al. / PoseFlow : A Deep Motion Representation for Understanding Human Behaviors in Videos. Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018. IEEE Computer Society, 2018. pp. 6762-6770 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

@inproceedings{8760997b4e7847b0a2496ad20b91e817,

title = "PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos",

abstract = "Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This 'noisy' motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: Pose estimation and action recognition.",

author = "Dingwen Zhang and Guangyu Guo and Dong Huang and Junwei Han",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 ; Conference date: 18-06-2018 Through 22-06-2018",

year = "2018",

month = dec,

day = "14",

doi = "10.1109/CVPR.2018.00707",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "6762--6770",

booktitle = "Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018",

}

Zhang, D, Guo, G, Huang, D & Han, J 2018, PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos. in Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018., 8578805, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, pp. 6762-6770, 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, United States, 18/06/18. https://doi.org/10.1109/CVPR.2018.00707

PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos. / Zhang, Dingwen; Guo, Guangyu; Huang, Dong et al.
Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018. IEEE Computer Society, 2018. p. 6762-6770 8578805 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - PoseFlow

T2 - 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018

AU - Zhang, Dingwen

AU - Guo, Guangyu

AU - Huang, Dong

AU - Han, Junwei

PY - 2018/12/14

Y1 - 2018/12/14

N2 - Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This 'noisy' motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: Pose estimation and action recognition.

AB - Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This 'noisy' motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: Pose estimation and action recognition.

UR - http://www.scopus.com/inward/record.url?scp=85052978702&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2018.00707

DO - 10.1109/CVPR.2018.00707

M3 - 会议稿件

AN - SCOPUS:85052978702

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 6762

EP - 6770

BT - Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018

PB - IEEE Computer Society

Y2 - 18 June 2018 through 22 June 2018

ER -

Zhang D, Guo G, Huang D, Han J. PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos. In Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018. IEEE Computer Society. 2018. p. 6762-6770. 8578805. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR.2018.00707

PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this