TY - JOUR
T1 - 3D skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated CNN
AU - Li, Bo
AU - He, Mingyi
AU - Dai, Yuchao
AU - Cheng, Xuelian
AU - Chen, Yucheng
N1 - Publisher Copyright:
© 2018, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2018/9/1
Y1 - 2018/9/1
N2 - In this paper, we present an image classification approach to action recognition with 3D skeleton videos. First, we propose a video domain translation-scale invariant image mapping, which transforms the 3D skeleton videos to color images, namely skeleton images. Second, a multi-scale dilated convolutional neural network (CNN) is designed for the classification of the skeleton images. Our multi-scale dilated CNN model could effectively improve the frequency adaptiveness and exploit the discriminative temporal-spatial cues for the skeleton images. Even though the skeleton images are very different from natural images, we show that the fine-tuning strategy still works well. Furthermore, we propose different kinds of data augmentation strategies to improve the generalization and robustness of our method. Experimental results on popular benchmark datasets such as NTU RGB + D, UTD-MHAD, MSRC-12 and G3D demonstrate the superiority of our approach, which outperforms the state-of-the-art methods by a large margin.
AB - In this paper, we present an image classification approach to action recognition with 3D skeleton videos. First, we propose a video domain translation-scale invariant image mapping, which transforms the 3D skeleton videos to color images, namely skeleton images. Second, a multi-scale dilated convolutional neural network (CNN) is designed for the classification of the skeleton images. Our multi-scale dilated CNN model could effectively improve the frequency adaptiveness and exploit the discriminative temporal-spatial cues for the skeleton images. Even though the skeleton images are very different from natural images, we show that the fine-tuning strategy still works well. Furthermore, we propose different kinds of data augmentation strategies to improve the generalization and robustness of our method. Experimental results on popular benchmark datasets such as NTU RGB + D, UTD-MHAD, MSRC-12 and G3D demonstrate the superiority of our approach, which outperforms the state-of-the-art methods by a large margin.
KW - 3D skeleton
KW - CNN
KW - Image mapping
KW - Recognition
UR - http://www.scopus.com/inward/record.url?scp=85041558131&partnerID=8YFLogxK
U2 - 10.1007/s11042-018-5642-0
DO - 10.1007/s11042-018-5642-0
M3 - 文章
AN - SCOPUS:85041558131
SN - 1380-7501
VL - 77
SP - 22901
EP - 22921
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 17
ER -