A Dual Pipeline with Spatio-Temporal Attention Fusion Approach for Human Activity Recognition

Xiaodong Wang, Ying Li, Aiqing Fang, Pei He, Yangming Guo

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Sensor-based human activity recognition (SHAR) has gained more attention due to the rapid development of the Internet of Things (IoT). The critical issue for SHAR is rescuing the performance bottleneck from expensive feature engineering. Recent works have explored combining hybrid neural networks to improve the SHAR model architecture for learning informative representation. However, existing studies have not adequately provided a hierarchical structure that can represent human activities and capture specific representations hidden beneath interrelated low-level human activity sequences. In this work, we introduce a dual pipeline with a spatio-temporal attention fusion approach, termed the ST-attention dual pipeline, to address this problem. Specifically, the ST-attention dual pipeline employs sequence learning techniques in one pipeline to capture complex dependencies within behavior data and residual learning techniques in another pipeline to extract hierarchical details, then fuse them by the ST-attention fusion mechanism generated across spatial and temporal dimensions to improve presentation capabilities. Extensive experiments on public datasets (i.e., OPPORTUNITY, PAMAP2, and USC-HAD) have shown the ST-attention dual pipeline yields compelling results, and the spatio-temporal attention mechanism also achieves superior performance over other fusion methods.

Original languageEnglish
Pages (from-to)25150-25162
Number of pages13
JournalIEEE Sensors Journal
Volume24
Issue number15
DOIs
StatePublished - 2024

Keywords

  • Attention mechanism
  • depthwise separable convolution (DSC)
  • human activity recognition (HAR)
  • hybrid neural network
  • wearable sensors

Fingerprint

Dive into the research topics of 'A Dual Pipeline with Spatio-Temporal Attention Fusion Approach for Human Activity Recognition'. Together they form a unique fingerprint.

Cite this