Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance

Yiyang Yao; Xue Wang; Guoqing Zhou; Qing Wang

doi:10.1109/ETFG55873.2023.10408532

Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance

Yiyang Yao, Xue Wang, Guoqing Zhou, Qing Wang

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

This paper proposes a novel multimodal fusion network (MRDFNet) for egocentric hand action recognition from RGB-D videos. First, we utilize three separate streams to extract individual spatio-temporal features for different modalities, which include RGB frames, optical flow stacks, and depth frames. Particularly, for RGB and depth streams, an Attention-based Bidirectional Long Short-Term Memory network (Bi-LSTA) is used to identify regions of interest both spatially and temporally. Then, the extracted features are fed into a fusion module to obtain the integrated feature, which is finally used for egocentric hand action recognition. The fusion module is capable of learning complementary information from multiple modalities, i.e., preserving the distinctive property for each modality and meanwhile exploring the shareable property across modalities. Experimental results on both self-collected RGB-D Egocentric Manual Operation Dataset in Electrical Substations (REMOD-ES) and the THU-READ containing daily-life actions show the superiority of the proposed approach over state-of-the-art methods.

Original language	English
Title of host publication	2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781665471640
DOIs	https://doi.org/10.1109/ETFG55873.2023.10408532
State	Published - 2023
Event	2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023 - Wollongong, Australia Duration: 3 Dec 2023 → 6 Dec 2023

Publication series

Name	2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023

Conference

Conference	2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023
Country/Territory	Australia
City	Wollongong
Period	3/12/23 → 6/12/23

Keywords

attention mechanism
egocentric video
hand action recognition
human-object interaction
multimodal data

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/ETFG55873.2023.10408532

Cite this

Yao, Y., Wang, X., Zhou, G., & Wang, Q. (2023). Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance. In 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023 (2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ETFG55873.2023.10408532

Yao, Yiyang ; Wang, Xue ; Zhou, Guoqing et al. / Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance. 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023. Institute of Electrical and Electronics Engineers Inc., 2023. (2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023).

@inproceedings{4dfeb49b4f4a44679a6f1cf268b12cab,

title = "Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance",

abstract = "This paper proposes a novel multimodal fusion network (MRDFNet) for egocentric hand action recognition from RGB-D videos. First, we utilize three separate streams to extract individual spatio-temporal features for different modalities, which include RGB frames, optical flow stacks, and depth frames. Particularly, for RGB and depth streams, an Attention-based Bidirectional Long Short-Term Memory network (Bi-LSTA) is used to identify regions of interest both spatially and temporally. Then, the extracted features are fed into a fusion module to obtain the integrated feature, which is finally used for egocentric hand action recognition. The fusion module is capable of learning complementary information from multiple modalities, i.e., preserving the distinctive property for each modality and meanwhile exploring the shareable property across modalities. Experimental results on both self-collected RGB-D Egocentric Manual Operation Dataset in Electrical Substations (REMOD-ES) and the THU-READ containing daily-life actions show the superiority of the proposed approach over state-of-the-art methods.",

keywords = "attention mechanism, egocentric video, hand action recognition, human-object interaction, multimodal data",

author = "Yiyang Yao and Xue Wang and Guoqing Zhou and Qing Wang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023 ; Conference date: 03-12-2023 Through 06-12-2023",

year = "2023",

doi = "10.1109/ETFG55873.2023.10408532",

language = "英语",

series = "2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023",

}

Yao, Y, Wang, X, Zhou, G & Wang, Q 2023, Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance. in 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023. 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023, Institute of Electrical and Electronics Engineers Inc., 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023, Wollongong, Australia, 3/12/23. https://doi.org/10.1109/ETFG55873.2023.10408532

Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance. / Yao, Yiyang; Wang, Xue; Zhou, Guoqing et al.
2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023. Institute of Electrical and Electronics Engineers Inc., 2023. (2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance

AU - Yao, Yiyang

AU - Wang, Xue

AU - Zhou, Guoqing

AU - Wang, Qing

PY - 2023

Y1 - 2023

N2 - This paper proposes a novel multimodal fusion network (MRDFNet) for egocentric hand action recognition from RGB-D videos. First, we utilize three separate streams to extract individual spatio-temporal features for different modalities, which include RGB frames, optical flow stacks, and depth frames. Particularly, for RGB and depth streams, an Attention-based Bidirectional Long Short-Term Memory network (Bi-LSTA) is used to identify regions of interest both spatially and temporally. Then, the extracted features are fed into a fusion module to obtain the integrated feature, which is finally used for egocentric hand action recognition. The fusion module is capable of learning complementary information from multiple modalities, i.e., preserving the distinctive property for each modality and meanwhile exploring the shareable property across modalities. Experimental results on both self-collected RGB-D Egocentric Manual Operation Dataset in Electrical Substations (REMOD-ES) and the THU-READ containing daily-life actions show the superiority of the proposed approach over state-of-the-art methods.

AB - This paper proposes a novel multimodal fusion network (MRDFNet) for egocentric hand action recognition from RGB-D videos. First, we utilize three separate streams to extract individual spatio-temporal features for different modalities, which include RGB frames, optical flow stacks, and depth frames. Particularly, for RGB and depth streams, an Attention-based Bidirectional Long Short-Term Memory network (Bi-LSTA) is used to identify regions of interest both spatially and temporally. Then, the extracted features are fed into a fusion module to obtain the integrated feature, which is finally used for egocentric hand action recognition. The fusion module is capable of learning complementary information from multiple modalities, i.e., preserving the distinctive property for each modality and meanwhile exploring the shareable property across modalities. Experimental results on both self-collected RGB-D Egocentric Manual Operation Dataset in Electrical Substations (REMOD-ES) and the THU-READ containing daily-life actions show the superiority of the proposed approach over state-of-the-art methods.

KW - attention mechanism

KW - egocentric video

KW - hand action recognition

KW - human-object interaction

KW - multimodal data

UR - http://www.scopus.com/inward/record.url?scp=85185765388&partnerID=8YFLogxK

U2 - 10.1109/ETFG55873.2023.10408532

DO - 10.1109/ETFG55873.2023.10408532

M3 - 会议稿件

AN - SCOPUS:85185765388

T3 - 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023

BT - 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023

Y2 - 3 December 2023 through 6 December 2023

ER -

Yao Y, Wang X, Zhou G, Wang Q. Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance. In 2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023. Institute of Electrical and Electronics Engineers Inc. 2023. (2023 IEEE International Conference on Energy Technologies for Future Grids, ETFG 2023). doi: 10.1109/ETFG55873.2023.10408532

Hand Action Recognition from RGB-D Egocentric Videos in Substations Operations and Maintenance

Abstract

Publication series

Conference

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this