TY - GEN
T1 - Learning shape-motion representations from geometric algebra spatio-temporal model for skeleton-based action recognition
AU - Li, Yanshan
AU - Xia, Rongjie
AU - Liu, Xing
AU - Huang, Qinghua
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - Skeleton-based action recognition has been widely applied in intelligent video surveillance and human behavior analysis. Previous works have successfully applied Convolutional Neural Networks (CNN) to learn spatio-temporal characteristics of the skeleton sequence. However, they merely focus on the coordinates of isolated joints, which ignore the spatial relationships between joints and only implicitly learn the motion representations. To solve these problems, we propose an effective method to learn comprehensive representations from skeleton sequences by using Geometric Algebra. Firstly, a frontal orientation based spatio-temporal model is constructed to represent the spatial configuration and temporal dynamics of skeleton sequences, which owns the robustness against view variations. Then the shape-motion representations which mutually compensate are learned to describe skeleton actions comprehensively. Finally, a multi-stream CNN model is applied to extract and fuse deep features from the complementary shape-motion representations. Experimental results on NTU RGB+D and Northwestern-UCLA datasets consistently verify the superiority of our method.
AB - Skeleton-based action recognition has been widely applied in intelligent video surveillance and human behavior analysis. Previous works have successfully applied Convolutional Neural Networks (CNN) to learn spatio-temporal characteristics of the skeleton sequence. However, they merely focus on the coordinates of isolated joints, which ignore the spatial relationships between joints and only implicitly learn the motion representations. To solve these problems, we propose an effective method to learn comprehensive representations from skeleton sequences by using Geometric Algebra. Firstly, a frontal orientation based spatio-temporal model is constructed to represent the spatial configuration and temporal dynamics of skeleton sequences, which owns the robustness against view variations. Then the shape-motion representations which mutually compensate are learned to describe skeleton actions comprehensively. Finally, a multi-stream CNN model is applied to extract and fuse deep features from the complementary shape-motion representations. Experimental results on NTU RGB+D and Northwestern-UCLA datasets consistently verify the superiority of our method.
KW - Geometric algebra
KW - Human action recognition
KW - Skeleton sequence
KW - Spatio-temporal model
UR - http://www.scopus.com/inward/record.url?scp=85071022633&partnerID=8YFLogxK
U2 - 10.1109/ICME.2019.00187
DO - 10.1109/ICME.2019.00187
M3 - 会议稿件
AN - SCOPUS:85071022633
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
SP - 1066
EP - 1071
BT - Proceedings - 2019 IEEE International Conference on Multimedia and Expo, ICME 2019
PB - IEEE Computer Society
T2 - 2019 IEEE International Conference on Multimedia and Expo, ICME 2019
Y2 - 8 July 2019 through 12 July 2019
ER -