A Gesture Recognition Approach Using Multimodal Neural Network

Xiaoyu Song; Hong Chen; Qing Wang

doi:10.1088/1742-6596/1544/1/012127

A Gesture Recognition Approach Using Multimodal Neural Network

Xiaoyu Song, Hong Chen, Qing Wang

China Agricultural University

科研成果: 期刊稿件 › 会议文章 › 同行评审

1 引用（Scopus）

摘要

Gesture recognition based on visual modal often encounters the problem of reduced recognition rate in some extreme environments such as in a dim or near-skinned background. When human beings make judgments, they will integrate various modal information. There should also be some connections between human gestures and speech. Based on this, we propose a multimodal gesture recognition network. We use 3D CNN to extract visual features, GRU to extract speech features, and fuse them at late stage to make the final judgment. At the same time, we use a two-stage structure, a shallow network as detector and a deep network as classifier to reduce the memory usage and energy consumption. We make a gesture dataset recorded in a dim environment, named DarkGesture. In this dataset, people say the gesture's name when they make a gesture. Then, the network proposed in this paper is compared with the single-modal recognition network based on DarkGesture. The results show that the multi-modal recognition network proposed in this paper has better recognition effect.

源语言	英语
文章编号	012127
期刊	Journal of Physics: Conference Series
卷	1544
期	1
DOI	https://doi.org/10.1088/1742-6596/1544/1/012127
出版状态	已出版 - 2 6月 2020
已对外发布	是
活动	2020 5th International Conference on Intelligent Computing and Signal Processing, ICSP 2020 - Suzhou, 中国期限: 20 3月 2020 → 22 3月 2020

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.1088/1742-6596/1544/1/012127

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{cd3cbff6f3494b11b7430db0c048f1e2,

title = "A Gesture Recognition Approach Using Multimodal Neural Network",

abstract = "Gesture recognition based on visual modal often encounters the problem of reduced recognition rate in some extreme environments such as in a dim or near-skinned background. When human beings make judgments, they will integrate various modal information. There should also be some connections between human gestures and speech. Based on this, we propose a multimodal gesture recognition network. We use 3D CNN to extract visual features, GRU to extract speech features, and fuse them at late stage to make the final judgment. At the same time, we use a two-stage structure, a shallow network as detector and a deep network as classifier to reduce the memory usage and energy consumption. We make a gesture dataset recorded in a dim environment, named DarkGesture. In this dataset, people say the gesture's name when they make a gesture. Then, the network proposed in this paper is compared with the single-modal recognition network based on DarkGesture. The results show that the multi-modal recognition network proposed in this paper has better recognition effect.",

author = "Xiaoyu Song and Hong Chen and Qing Wang",

note = "Publisher Copyright: {\textcopyright} 2019 Published under licence by IOP Publishing Ltd.; 2020 5th International Conference on Intelligent Computing and Signal Processing, ICSP 2020 ; Conference date: 20-03-2020 Through 22-03-2020",

year = "2020",

month = jun,

day = "2",

doi = "10.1088/1742-6596/1544/1/012127",

language = "英语",

volume = "1544",

journal = "Journal of Physics: Conference Series",

issn = "1742-6588",

publisher = "IOP Publishing Ltd.",

number = "1",

}

TY - JOUR

T1 - A Gesture Recognition Approach Using Multimodal Neural Network

AU - Song, Xiaoyu

AU - Chen, Hong

AU - Wang, Qing

PY - 2020/6/2

Y1 - 2020/6/2

N2 - Gesture recognition based on visual modal often encounters the problem of reduced recognition rate in some extreme environments such as in a dim or near-skinned background. When human beings make judgments, they will integrate various modal information. There should also be some connections between human gestures and speech. Based on this, we propose a multimodal gesture recognition network. We use 3D CNN to extract visual features, GRU to extract speech features, and fuse them at late stage to make the final judgment. At the same time, we use a two-stage structure, a shallow network as detector and a deep network as classifier to reduce the memory usage and energy consumption. We make a gesture dataset recorded in a dim environment, named DarkGesture. In this dataset, people say the gesture's name when they make a gesture. Then, the network proposed in this paper is compared with the single-modal recognition network based on DarkGesture. The results show that the multi-modal recognition network proposed in this paper has better recognition effect.

AB - Gesture recognition based on visual modal often encounters the problem of reduced recognition rate in some extreme environments such as in a dim or near-skinned background. When human beings make judgments, they will integrate various modal information. There should also be some connections between human gestures and speech. Based on this, we propose a multimodal gesture recognition network. We use 3D CNN to extract visual features, GRU to extract speech features, and fuse them at late stage to make the final judgment. At the same time, we use a two-stage structure, a shallow network as detector and a deep network as classifier to reduce the memory usage and energy consumption. We make a gesture dataset recorded in a dim environment, named DarkGesture. In this dataset, people say the gesture's name when they make a gesture. Then, the network proposed in this paper is compared with the single-modal recognition network based on DarkGesture. The results show that the multi-modal recognition network proposed in this paper has better recognition effect.

UR - http://www.scopus.com/inward/record.url?scp=85086363939&partnerID=8YFLogxK

U2 - 10.1088/1742-6596/1544/1/012127

DO - 10.1088/1742-6596/1544/1/012127

M3 - 会议文章

AN - SCOPUS:85086363939

SN - 1742-6588

VL - 1544

JO - Journal of Physics: Conference Series

JF - Journal of Physics: Conference Series

IS - 1

M1 - 012127

T2 - 2020 5th International Conference on Intelligent Computing and Signal Processing, ICSP 2020

Y2 - 20 March 2020 through 22 March 2020

ER -

A Gesture Recognition Approach Using Multimodal Neural Network

摘要

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此