Wi-CLIP: Toward Zero-Shot Air Gesture Recognition Based on RF-Text Foundation Model

  • Haoyu Zhang
  • , Yifan Guo
  • , Zhu Wang
  • , Zhuo Sun
  • , Bin Guo
  • , Zhiwen Yu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Wi-Fi-based gesture recognition, driven by deep learning, holds significant promise for privacy-preserving and all-weather sensing. However, current methods typically rely on large amounts of labeled data, and Wi-Fi signals vary significantly across gestures, leading to severe performance degradation when models encounter unseen gestures. To address these challenges, we explore the potential of transferring knowledge from large pre-trained language models to improve the generalization of Wi-Fi-based gesture recognition systems. To this end, we propose a zero-shot gesture recognition framework, named Wi-CLIP. Inspired by the vision-language pre-training model CLIP, our method constructs a cross-modal radio frequency-text model centered on aligning Wi-Fi signals with textual semantics. Specifically, we develop a novel Wi-Fi signal encoder and a BERT-based text encoder, aligning the two modalities within a shared semantic space using contrastive learning. Our framework achieves an average recognition accuracy of 89.12% across 6 gestures. Notably, when trained on only 5 gestures, Wi-CLIP demonstrates a remarkable zero-shot recognition accuracy of 78.79% on the sixth, previously unseen gesture. This highlights its strong generalization capability and effectiveness in cross-modal representation learning.

Original languageEnglish
Title of host publicationArtificial Intelligence of Things and Systems - 3rd International Conference, AIoTSys 2025, Proceedings
EditorsSicong Liu, Xiaolong Zheng, Dong Ma, Yuezhong Wu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages174-189
Number of pages16
ISBN (Print)9789819525805
DOIs
StatePublished - 2026
Event3rd International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2025 - Lanzhou, China
Duration: 15 Aug 202517 Aug 2025

Publication series

NameCommunications in Computer and Information Science
Volume2650 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference3rd International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2025
Country/TerritoryChina
CityLanzhou
Period15/08/2517/08/25

Keywords

  • Gesture Recognition
  • Vision Language Model
  • Wireless Sensing
  • Zero Shot Learning

Fingerprint

Dive into the research topics of 'Wi-CLIP: Toward Zero-Shot Air Gesture Recognition Based on RF-Text Foundation Model'. Together they form a unique fingerprint.

Cite this