TY - JOUR
T1 - A Gesture Recognition Approach Using Multimodal Neural Network
AU - Song, Xiaoyu
AU - Chen, Hong
AU - Wang, Qing
N1 - Publisher Copyright:
© 2019 Published under licence by IOP Publishing Ltd.
PY - 2020/6/2
Y1 - 2020/6/2
N2 - Gesture recognition based on visual modal often encounters the problem of reduced recognition rate in some extreme environments such as in a dim or near-skinned background. When human beings make judgments, they will integrate various modal information. There should also be some connections between human gestures and speech. Based on this, we propose a multimodal gesture recognition network. We use 3D CNN to extract visual features, GRU to extract speech features, and fuse them at late stage to make the final judgment. At the same time, we use a two-stage structure, a shallow network as detector and a deep network as classifier to reduce the memory usage and energy consumption. We make a gesture dataset recorded in a dim environment, named DarkGesture. In this dataset, people say the gesture's name when they make a gesture. Then, the network proposed in this paper is compared with the single-modal recognition network based on DarkGesture. The results show that the multi-modal recognition network proposed in this paper has better recognition effect.
AB - Gesture recognition based on visual modal often encounters the problem of reduced recognition rate in some extreme environments such as in a dim or near-skinned background. When human beings make judgments, they will integrate various modal information. There should also be some connections between human gestures and speech. Based on this, we propose a multimodal gesture recognition network. We use 3D CNN to extract visual features, GRU to extract speech features, and fuse them at late stage to make the final judgment. At the same time, we use a two-stage structure, a shallow network as detector and a deep network as classifier to reduce the memory usage and energy consumption. We make a gesture dataset recorded in a dim environment, named DarkGesture. In this dataset, people say the gesture's name when they make a gesture. Then, the network proposed in this paper is compared with the single-modal recognition network based on DarkGesture. The results show that the multi-modal recognition network proposed in this paper has better recognition effect.
UR - http://www.scopus.com/inward/record.url?scp=85086363939&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/1544/1/012127
DO - 10.1088/1742-6596/1544/1/012127
M3 - 会议文章
AN - SCOPUS:85086363939
SN - 1742-6588
VL - 1544
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 1
M1 - 012127
T2 - 2020 5th International Conference on Intelligent Computing and Signal Processing, ICSP 2020
Y2 - 20 March 2020 through 22 March 2020
ER -