Abstract
This paper proposes a novel multimodal fusion network (MRDFNet) for egocentric hand action recognition from RGB-D videos. First, we utilize three separate streams to extract individual spatio-temporal features for different modalities, which include RGB frames, optical flow stacks, and depth frames. Particularly, for RGB and depth streams, an Attention-based Bidirectional Long Short-Term Memory network (Bi-LSTA) is used to identify regions of interest both spatially and temporally. Then, the extracted features are fed into a fusion module to obtain the integrated feature, which is finally used for egocentric hand action recognition. The fusion module is capable of learning complementary information from multiple modalities, i.e., preserving the distinctive property for each modality and meanwhile exploring the shareable property across modalities. Experimental results on both self-collected RGB-D Egocentric Manual Operation Dataset in Electrical Substations (REMOD-ES) and the THU-READ containing daily-life actions show the superiority of the proposed approach over state-of-the-art methods.
| Original language | English |
|---|---|
| Title of host publication | 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781665471640 |
| DOIs | |
| State | Published - 2023 |
| Event | 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023 - Wollongong, Australia Duration: 3 Dec 2023 → 6 Dec 2023 |
Publication series
| Name | 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023 |
|---|
Conference
| Conference | 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023 |
|---|---|
| Country/Territory | Australia |
| City | Wollongong |
| Period | 3/12/23 → 6/12/23 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- attention mechanism
- egocentric video
- hand action recognition
- human-object interaction
- multimodal data
Fingerprint
Dive into the research topics of 'Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver