TY - GEN
T1 - Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation
AU - Han, Yamin
AU - Zhang, Peng
AU - Zhuo, Tao
AU - Huang, Wei
AU - Zhang, Yanning
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/8/22
Y1 - 2017/8/22
N2 - Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies. To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.
AB - Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies. To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.
UR - http://www.scopus.com/inward/record.url?scp=85030248594&partnerID=8YFLogxK
U2 - 10.1109/CVPRW.2017.162
DO - 10.1109/CVPRW.2017.162
M3 - 会议稿件
AN - SCOPUS:85030248594
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 1226
EP - 1235
BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017
PB - IEEE Computer Society
T2 - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017
Y2 - 21 July 2017 through 26 July 2017
ER -