摘要
With the extensive deployment of surveillance cameras, Weakly Supervised Video Anomaly Detection (WSVAD) has attracted increasing attention in many fields. It significantly reduces the labeling cost by relying only on video-level labels for training, and shows important significance in practical applications. However, existing methods often depend on unimodal visual information, neglecting the rich semantic information embedded in video description text. To address this limitation, this paper proposes a novel framework: Generative Description Boosted Weakly Supervised Video Anomaly Detection (DBVAD). DBVAD leverages large vision language models as the knowledge engine to generate video descriptions, which are then utilized as semantic supervision signals to optimize visual features. The proposed DBVAD comprises several key components. First, the key event selection strategy is used to accurately select key frames from videos for subsequent description generation. Second, the temporal modeling module captures the multi-scale temporal dependencies within videos. Lastly, the semantic focus prompt calibrates visual representations using label texts, while the description boosted module achieves fine alignment between visual features and generated description text through contrastive learning, thereby enhancing the model’s semantic understanding of abnormal events. Experimental results indicate that DBVAD achieves superior performance on the large-scale UCF-Crime and XD-Violence datasets, thereby validating its effectiveness.
| 源语言 | 英语 |
|---|---|
| 主期刊名 | Pattern Recognition and Computer Vision - 8th Chinese Conference, PRCV 2025, Proceedings |
| 编辑 | Josef Kittler, Hongkai Xiong, Weiyao Lin, Jian Yang, Xilin Chen, Jiwen Lu, Jingyi Yu, Weishi Zheng |
| 出版商 | Springer Science and Business Media Deutschland GmbH |
| 页 | 358-372 |
| 页数 | 15 |
| ISBN(印刷版) | 9789819555666 |
| DOI | |
| 出版状态 | 已出版 - 2026 |
| 活动 | 8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025 - Shanghai, 中国 期限: 15 10月 2025 → 18 10月 2025 |
出版系列
| 姓名 | Lecture Notes in Computer Science |
|---|---|
| 卷 | 16276 LNCS |
| ISSN(印刷版) | 0302-9743 |
| ISSN(电子版) | 1611-3349 |
会议
| 会议 | 8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025 |
|---|---|
| 国家/地区 | 中国 |
| 市 | Shanghai |
| 时期 | 15/10/25 → 18/10/25 |
联合国可持续发展目标
此成果有助于实现下列可持续发展目标:
-
可持续发展目标 16 和平、正义和强大机构
指纹
探究 'Boosting Weakly Supervised Video Anomaly Detection with Generative Description' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver