Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance

Yiyang Yao, Xue Wang, Guoqing Zhou, Qing Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes a novel multimodal fusion network (MRDFNet) for egocentric hand action recognition from RGB-D videos. First, we utilize three separate streams to extract individual spatio-temporal features for different modalities, which include RGB frames, optical flow stacks, and depth frames. Particularly, for RGB and depth streams, an Attention-based Bidirectional Long Short-Term Memory network (Bi-LSTA) is used to identify regions of interest both spatially and temporally. Then, the extracted features are fed into a fusion module to obtain the integrated feature, which is finally used for egocentric hand action recognition. The fusion module is capable of learning complementary information from multiple modalities, i.e., preserving the distinctive property for each modality and meanwhile exploring the shareable property across modalities. Experimental results on both self-collected RGB-D Egocentric Manual Operation Dataset in Electrical Substations (REMOD-ES) and the THU-READ containing daily-life actions show the superiority of the proposed approach over state-of-the-art methods.

Original languageEnglish
Title of host publication2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665471640
DOIs
StatePublished - 2023
Event2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023 - Wollongong, Australia
Duration: 3 Dec 20236 Dec 2023

Publication series

Name2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023

Conference

Conference2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023
Country/TerritoryAustralia
CityWollongong
Period3/12/236/12/23

Keywords

  • attention mechanism
  • egocentric video
  • hand action recognition
  • human-object interaction
  • multimodal data

Fingerprint

Dive into the research topics of 'Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance'. Together they form a unique fingerprint.

Cite this