UniFi: A Unified Framework for Generalizable Gesture Recognition with Wi-Fi Signals Using Consistency-guided Multi-View Networks

Yan Liu; Anlan Yu; Leye Wang; Bin Guo; Yang Li; Enze Yi; Daqing Zhang

doi:10.1145/3631429

UniFi: A Unified Framework for Generalizable Gesture Recognition with Wi-Fi Signals Using Consistency-guided Multi-View Networks

Yan Liu, Anlan Yu, Leye Wang, Bin Guo, Yang Li, Enze Yi, Daqing Zhang

School of Computer Science

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

In recent years, considerable endeavors have been devoted to exploring Wi-Fi-based sensing technologies by modeling the intricate mapping between received signals and corresponding human activities. However, the inherent complexity of Wi-Fi signals poses significant challenges for practical applications due to their pronounced susceptibility to deployment environments. To address this challenge, we delve into the distinctive characteristics of Wi-Fi signals and distill three pivotal factors that can be leveraged to enhance generalization capabilities of deep learning-based Wi-Fi sensing models: 1) effectively capture valuable input to mitigate the adverse impact of noisy measurements; 2) adaptively fuse complementary information from multiple Wi-Fi devices to boost the distinguishability of signal patterns associated with different activities; 3) extract generalizable features that can overcome the inconsistent representations of activities under different environmental conditions (e.g., locations, orientations). Leveraging these insights, we design a novel and unified sensing framework based on Wi-Fi signals, dubbed UniFi, and use gesture recognition as an application to demonstrate its effectiveness. UniFi achieves robust and generalizable gesture recognition in real-world scenarios by extracting discriminative and consistent features unrelated to environmental factors from pre-denoised signals collected by multiple transceivers. To achieve this, we first introduce an effective signal preprocessing approach that captures the applicable input data from noisy received signals for the deep learning model. Second, we propose a multi-view deep network based on spatio-temporal cross-view attention that integrates multi-carrier and multi-device signals to extract distinguishable information. Finally, we present the mutual information maximization as a regularizer to learn environment-invariant representations via contrastive loss without requiring access to any signals from unseen environments for practical adaptation. Extensive experiments on the Widar 3.0 dataset demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in different settings (99% and 90%-98% accuracy for in-domain and cross-domain recognition without additional data collection and model training).

Original language	English
Article number	168
Journal	Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Volume	7
Issue number	4
DOIs	https://doi.org/10.1145/3631429
State	Published - 12 Jan 2024

Keywords

Channel State Information (CSI)
Deep learning
Gesture Recognition
Wireless Sensing

Access to Document

10.1145/3631429

Cite this

@article{f59c797320ac4dda8eef9fbb3ea2ba96,

title = "UniFi: A Unified Framework for Generalizable Gesture Recognition with Wi-Fi Signals Using Consistency-guided Multi-View Networks",

abstract = "In recent years, considerable endeavors have been devoted to exploring Wi-Fi-based sensing technologies by modeling the intricate mapping between received signals and corresponding human activities. However, the inherent complexity of Wi-Fi signals poses significant challenges for practical applications due to their pronounced susceptibility to deployment environments. To address this challenge, we delve into the distinctive characteristics of Wi-Fi signals and distill three pivotal factors that can be leveraged to enhance generalization capabilities of deep learning-based Wi-Fi sensing models: 1) effectively capture valuable input to mitigate the adverse impact of noisy measurements; 2) adaptively fuse complementary information from multiple Wi-Fi devices to boost the distinguishability of signal patterns associated with different activities; 3) extract generalizable features that can overcome the inconsistent representations of activities under different environmental conditions (e.g., locations, orientations). Leveraging these insights, we design a novel and unified sensing framework based on Wi-Fi signals, dubbed UniFi, and use gesture recognition as an application to demonstrate its effectiveness. UniFi achieves robust and generalizable gesture recognition in real-world scenarios by extracting discriminative and consistent features unrelated to environmental factors from pre-denoised signals collected by multiple transceivers. To achieve this, we first introduce an effective signal preprocessing approach that captures the applicable input data from noisy received signals for the deep learning model. Second, we propose a multi-view deep network based on spatio-temporal cross-view attention that integrates multi-carrier and multi-device signals to extract distinguishable information. Finally, we present the mutual information maximization as a regularizer to learn environment-invariant representations via contrastive loss without requiring access to any signals from unseen environments for practical adaptation. Extensive experiments on the Widar 3.0 dataset demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in different settings (99% and 90%-98% accuracy for in-domain and cross-domain recognition without additional data collection and model training).",

keywords = "Channel State Information (CSI), Deep learning, Gesture Recognition, Wireless Sensing",

author = "Yan Liu and Anlan Yu and Leye Wang and Bin Guo and Yang Li and Enze Yi and Daqing Zhang",

note = "Publisher Copyright: {\textcopyright} 2024 ACM.",

year = "2024",

month = jan,

day = "12",

doi = "10.1145/3631429",

language = "英语",

volume = "7",

journal = "Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies",

issn = "2474-9567",

publisher = "Association for Computing Machinery (ACM)",

number = "4",

}

TY - JOUR

T1 - UniFi

T2 - A Unified Framework for Generalizable Gesture Recognition with Wi-Fi Signals Using Consistency-guided Multi-View Networks

AU - Liu, Yan

AU - Yu, Anlan

AU - Wang, Leye

AU - Guo, Bin

AU - Li, Yang

AU - Yi, Enze

AU - Zhang, Daqing

PY - 2024/1/12

Y1 - 2024/1/12

N2 - In recent years, considerable endeavors have been devoted to exploring Wi-Fi-based sensing technologies by modeling the intricate mapping between received signals and corresponding human activities. However, the inherent complexity of Wi-Fi signals poses significant challenges for practical applications due to their pronounced susceptibility to deployment environments. To address this challenge, we delve into the distinctive characteristics of Wi-Fi signals and distill three pivotal factors that can be leveraged to enhance generalization capabilities of deep learning-based Wi-Fi sensing models: 1) effectively capture valuable input to mitigate the adverse impact of noisy measurements; 2) adaptively fuse complementary information from multiple Wi-Fi devices to boost the distinguishability of signal patterns associated with different activities; 3) extract generalizable features that can overcome the inconsistent representations of activities under different environmental conditions (e.g., locations, orientations). Leveraging these insights, we design a novel and unified sensing framework based on Wi-Fi signals, dubbed UniFi, and use gesture recognition as an application to demonstrate its effectiveness. UniFi achieves robust and generalizable gesture recognition in real-world scenarios by extracting discriminative and consistent features unrelated to environmental factors from pre-denoised signals collected by multiple transceivers. To achieve this, we first introduce an effective signal preprocessing approach that captures the applicable input data from noisy received signals for the deep learning model. Second, we propose a multi-view deep network based on spatio-temporal cross-view attention that integrates multi-carrier and multi-device signals to extract distinguishable information. Finally, we present the mutual information maximization as a regularizer to learn environment-invariant representations via contrastive loss without requiring access to any signals from unseen environments for practical adaptation. Extensive experiments on the Widar 3.0 dataset demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in different settings (99% and 90%-98% accuracy for in-domain and cross-domain recognition without additional data collection and model training).

AB - In recent years, considerable endeavors have been devoted to exploring Wi-Fi-based sensing technologies by modeling the intricate mapping between received signals and corresponding human activities. However, the inherent complexity of Wi-Fi signals poses significant challenges for practical applications due to their pronounced susceptibility to deployment environments. To address this challenge, we delve into the distinctive characteristics of Wi-Fi signals and distill three pivotal factors that can be leveraged to enhance generalization capabilities of deep learning-based Wi-Fi sensing models: 1) effectively capture valuable input to mitigate the adverse impact of noisy measurements; 2) adaptively fuse complementary information from multiple Wi-Fi devices to boost the distinguishability of signal patterns associated with different activities; 3) extract generalizable features that can overcome the inconsistent representations of activities under different environmental conditions (e.g., locations, orientations). Leveraging these insights, we design a novel and unified sensing framework based on Wi-Fi signals, dubbed UniFi, and use gesture recognition as an application to demonstrate its effectiveness. UniFi achieves robust and generalizable gesture recognition in real-world scenarios by extracting discriminative and consistent features unrelated to environmental factors from pre-denoised signals collected by multiple transceivers. To achieve this, we first introduce an effective signal preprocessing approach that captures the applicable input data from noisy received signals for the deep learning model. Second, we propose a multi-view deep network based on spatio-temporal cross-view attention that integrates multi-carrier and multi-device signals to extract distinguishable information. Finally, we present the mutual information maximization as a regularizer to learn environment-invariant representations via contrastive loss without requiring access to any signals from unseen environments for practical adaptation. Extensive experiments on the Widar 3.0 dataset demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in different settings (99% and 90%-98% accuracy for in-domain and cross-domain recognition without additional data collection and model training).

KW - Channel State Information (CSI)

KW - Deep learning

KW - Gesture Recognition

KW - Wireless Sensing

UR - http://www.scopus.com/inward/record.url?scp=85182599874&partnerID=8YFLogxK

U2 - 10.1145/3631429

DO - 10.1145/3631429

M3 - 文章

AN - SCOPUS:85182599874

SN - 2474-9567

VL - 7

JO - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

JF - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

IS - 4

M1 - 168

ER -

UniFi: A Unified Framework for Generalizable Gesture Recognition with Wi-Fi Signals Using Consistency-guided Multi-View Networks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this