Early Action Prediction with Generative Adversarial Networks

Dong Wang; Yuan Yuan; Qi Wang

doi:10.1109/ACCESS.2019.2904857

Early Action Prediction with Generative Adversarial Networks

Dong Wang, Yuan Yuan, Qi Wang

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

23 Scopus citations

Abstract

Action Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this paper, we address this problem by developing an end-to-end architecture that improves the discriminability of features of partially observed videos by assimilating them to features from complete videos. For this purpose, the generative adversarial network is introduced for tackling action prediction problem, which improves the recognition accuracy of partially observed videos though narrowing the feature difference of partially observed videos from complete ones. Specifically, its generator comprises of two networks: a CNN for feature extraction and an LSTM for estimating residual error between features of the partially observed videos and complete ones, and then the features from CNN adds the residual error from LSTM, which is regarded as the enhanced feature to fool a competing discriminator. Meanwhile, the generator is trained with an additional perceptual objective, which forces the enhanced features of partially observed videos are discriminative enough for action prediction. The extensive experimental results on UCF101, BIT, and UT-Interaction datasets demonstrate that our approach outperforms the state-of-the-art methods, especially for videos that less than 50% portion of frames is observed.

Original language	English
Article number	8666721
Pages (from-to)	35795-35804
Number of pages	10
Journal	IEEE Access
Volume	7
DOIs	https://doi.org/10.1109/ACCESS.2019.2904857
State	Published - 2019

Keywords

action prediction
Computer vision
video analysis

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/ACCESS.2019.2904857

Cite this

@article{dd503b541d634ed391475e12ff21de73,

title = "Early Action Prediction with Generative Adversarial Networks",

abstract = "Action Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this paper, we address this problem by developing an end-to-end architecture that improves the discriminability of features of partially observed videos by assimilating them to features from complete videos. For this purpose, the generative adversarial network is introduced for tackling action prediction problem, which improves the recognition accuracy of partially observed videos though narrowing the feature difference of partially observed videos from complete ones. Specifically, its generator comprises of two networks: a CNN for feature extraction and an LSTM for estimating residual error between features of the partially observed videos and complete ones, and then the features from CNN adds the residual error from LSTM, which is regarded as the enhanced feature to fool a competing discriminator. Meanwhile, the generator is trained with an additional perceptual objective, which forces the enhanced features of partially observed videos are discriminative enough for action prediction. The extensive experimental results on UCF101, BIT, and UT-Interaction datasets demonstrate that our approach outperforms the state-of-the-art methods, especially for videos that less than 50% portion of frames is observed.",

keywords = "action prediction, Computer vision, video analysis",

author = "Dong Wang and Yuan Yuan and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2019",

doi = "10.1109/ACCESS.2019.2904857",

language = "英语",

volume = "7",

pages = "35795--35804",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Early Action Prediction with Generative Adversarial Networks

AU - Wang, Dong

AU - Yuan, Yuan

AU - Wang, Qi

PY - 2019

Y1 - 2019

N2 - Action Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this paper, we address this problem by developing an end-to-end architecture that improves the discriminability of features of partially observed videos by assimilating them to features from complete videos. For this purpose, the generative adversarial network is introduced for tackling action prediction problem, which improves the recognition accuracy of partially observed videos though narrowing the feature difference of partially observed videos from complete ones. Specifically, its generator comprises of two networks: a CNN for feature extraction and an LSTM for estimating residual error between features of the partially observed videos and complete ones, and then the features from CNN adds the residual error from LSTM, which is regarded as the enhanced feature to fool a competing discriminator. Meanwhile, the generator is trained with an additional perceptual objective, which forces the enhanced features of partially observed videos are discriminative enough for action prediction. The extensive experimental results on UCF101, BIT, and UT-Interaction datasets demonstrate that our approach outperforms the state-of-the-art methods, especially for videos that less than 50% portion of frames is observed.

AB - Action Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this paper, we address this problem by developing an end-to-end architecture that improves the discriminability of features of partially observed videos by assimilating them to features from complete videos. For this purpose, the generative adversarial network is introduced for tackling action prediction problem, which improves the recognition accuracy of partially observed videos though narrowing the feature difference of partially observed videos from complete ones. Specifically, its generator comprises of two networks: a CNN for feature extraction and an LSTM for estimating residual error between features of the partially observed videos and complete ones, and then the features from CNN adds the residual error from LSTM, which is regarded as the enhanced feature to fool a competing discriminator. Meanwhile, the generator is trained with an additional perceptual objective, which forces the enhanced features of partially observed videos are discriminative enough for action prediction. The extensive experimental results on UCF101, BIT, and UT-Interaction datasets demonstrate that our approach outperforms the state-of-the-art methods, especially for videos that less than 50% portion of frames is observed.

KW - action prediction

KW - Computer vision

KW - video analysis

UR - http://www.scopus.com/inward/record.url?scp=85063964249&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2019.2904857

DO - 10.1109/ACCESS.2019.2904857

M3 - 文章

AN - SCOPUS:85063964249

SN - 2169-3536

VL - 7

SP - 35795

EP - 35804

JO - IEEE Access

JF - IEEE Access

M1 - 8666721

ER -

Early Action Prediction with Generative Adversarial Networks

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this