Abstract
- Weakly supervised video anomaly detection (WS-VAD) is challenging because it relies on video-level binary annotations to make frame-level predictions. Existing methods often convert WSVAD into a multiple instance learning (MIL) task, focusing on isolated segments that contribute most to the classification while neglecting the temporal context and detailed feature distinctions. In this paper, we propose a contrastive clustering strategy that enhances the representation of normal and abnormal features. Specifically, we treat the clustering center features and their corresponding categories as positive sample pairs, while features from different categories are treated as negative samples. This approach enables the network to better explore the distinction between normal and abnormal features. Furthermore, we address the bias in pre-trained models, where I3D pre-training features tend to overfit to normal videos and CLIP features exhibit a bias towards abnormal videos. To mitigate this, we introduce a simple early fusion method that combines pre-trained features to eliminate bias and obtain more comprehensive spatio-temporal representations. Extensive experiments on the UCF-Crime and XD-Violence datasets demonstrate the effectiveness of our approach, achieving state-of-the-art performance.
| Original language | English |
|---|---|
| Journal | Proceedings of the International Joint Conference on Neural Networks |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 International Joint Conference on Neural Networks, IJCNN 2025 - Rome, Italy Duration: 30 Jun 2025 → 5 Jul 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 16 Peace, Justice and Strong Institutions
Keywords
- Contrastive clustering
- Feature fusion
- Representation learning
- Video anomaly detection
Fingerprint
Dive into the research topics of 'Weakly Supervised Video Anomaly Detection Via Contrastive Clustering'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver