Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation

Yamin Han; Peng Zhang; Tao Zhuo; Wei Huang; Yanning Zhang

doi:10.1109/CVPRW.2017.162

Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation

Yamin Han, Peng Zhang, Tao Zhuo, Wei Huang, Yanning Zhang

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

7 引用（Scopus）

摘要

Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies. To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.

源语言	英语
主期刊名	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017
出版商	IEEE Computer Society
页	1226-1235
页数	10
ISBN（电子版）	9781538607336
DOI	https://doi.org/10.1109/CVPRW.2017.162
出版状态	已出版 - 22 8月 2017
活动	30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017 - Honolulu, 美国期限: 21 7月 2017 → 26 7月 2017

出版系列

姓名	IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
卷	2017-July
ISSN（印刷版）	2160-7508
ISSN（电子版）	2160-7516

会议

会议	30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017
国家/地区	美国
市	Honolulu
时期	21/07/17 → 26/07/17

访问文件

10.1109/CVPRW.2017.162

其它文件与链接

链接到 Scopus 的出版物

引用此

Han, Y., Zhang, P., Zhuo, T., Huang, W., & Zhang, Y. (2017). Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation. 在 Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017 (页码 1226-1235). 文章 8014896 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; 卷 2017-July). IEEE Computer Society. https://doi.org/10.1109/CVPRW.2017.162

Han, Yamin ; Zhang, Peng ; Zhuo, Tao 等. / Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017. IEEE Computer Society, 2017. 页码 1226-1235 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops).

@inproceedings{1de67b13d7ed49969020a00494ab45b4,

title = "Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation",

abstract = "Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies. To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.",

author = "Yamin Han and Peng Zhang and Tao Zhuo and Wei Huang and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017 ; Conference date: 21-07-2017 Through 26-07-2017",

year = "2017",

month = aug,

day = "22",

doi = "10.1109/CVPRW.2017.162",

language = "英语",

series = "IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops",

publisher = "IEEE Computer Society",

pages = "1226--1235",

booktitle = "Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017",

}

Han, Y, Zhang, P, Zhuo, T, Huang, W & Zhang, Y 2017, Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation. 在 Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017., 8014896, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 卷 2017-July, IEEE Computer Society, 页码 1226-1235, 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017, Honolulu, 美国, 21/07/17. https://doi.org/10.1109/CVPRW.2017.162

Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation. / Han, Yamin; Zhang, Peng; Zhuo, Tao 等.
Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017. IEEE Computer Society, 2017. 页码 1226-1235 8014896 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; 卷 2017-July).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation

AU - Han, Yamin

AU - Zhang, Peng

AU - Zhuo, Tao

AU - Huang, Wei

AU - Zhang, Yanning

PY - 2017/8/22

Y1 - 2017/8/22

N2 - Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies. To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.

AB - Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies. To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.

UR - http://www.scopus.com/inward/record.url?scp=85030248594&partnerID=8YFLogxK

U2 - 10.1109/CVPRW.2017.162

DO - 10.1109/CVPRW.2017.162

M3 - 会议稿件

AN - SCOPUS:85030248594

T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

SP - 1226

EP - 1235

BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017

PB - IEEE Computer Society

T2 - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017

Y2 - 21 July 2017 through 26 July 2017

ER -

Han Y, Zhang P, Zhuo T, Huang W, Zhang Y. Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation. 在 Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017. IEEE Computer Society. 2017. 页码 1226-1235. 8014896. (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops). doi: 10.1109/CVPRW.2017.162

Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此