Enhancing Weakly Supervised Anomaly Detection in Surveillance Videos: The CLIP-Augmented Bimodal Memory Enhanced Network

Yinglong Wu, Zhaoyong Mao, Chenyang Yu, Guanglin Liu, Junge Shen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Aiming at the challenges of surveillance video anomaly detection(SVAD),especially the diversity and openness of its event types, we propose CLIP-Augmented Bimodal Memory Enhanced Network for weakly-supervised surveillance video anomaly detection. Specifically, we design a video feature extraction module based on CLIP feature, which significantly improves the ability to capture the semantic content of surveillance videos. Given the problem of semantic diversity of abnormal events, we further design a Bimodal Memory Unit(BMMU), which is used to enhance the model for all types of abnormal events by means of two kinds of memory module, storing the visual features and the textual descriptive features, in order to enhance the model's ability to remember and distinguish various types of anomalous features. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on the UCF-Crime and XD-Violence benchmark datasets.

Original languageEnglish
Title of host publication2024 18th International Conference on Control, Automation, Robotics and Vision, ICARCV 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages756-762
Number of pages7
ISBN (Electronic)9798331518493
DOIs
StatePublished - 2024
Event18th International Conference on Control, Automation, Robotics and Vision, ICARCV 2024 - Dubai, United Arab Emirates
Duration: 12 Dec 202415 Dec 2024

Publication series

Name2024 18th International Conference on Control, Automation, Robotics and Vision, ICARCV 2024

Conference

Conference18th International Conference on Control, Automation, Robotics and Vision, ICARCV 2024
Country/TerritoryUnited Arab Emirates
CityDubai
Period12/12/2415/12/24

Fingerprint

Dive into the research topics of 'Enhancing Weakly Supervised Anomaly Detection in Surveillance Videos: The CLIP-Augmented Bimodal Memory Enhanced Network'. Together they form a unique fingerprint.

Cite this