跳到主要导航 跳到搜索 跳到主要内容

Boosting Weakly Supervised Video Anomaly Detection with Generative Description

  • Chenlin Meng
  • , Zhaoyong Mao
  • , Chi Zhang
  • , Kai Jiang
  • , Junge Shen
  • Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

With the extensive deployment of surveillance cameras, Weakly Supervised Video Anomaly Detection (WSVAD) has attracted increasing attention in many fields. It significantly reduces the labeling cost by relying only on video-level labels for training, and shows important significance in practical applications. However, existing methods often depend on unimodal visual information, neglecting the rich semantic information embedded in video description text. To address this limitation, this paper proposes a novel framework: Generative Description Boosted Weakly Supervised Video Anomaly Detection (DBVAD). DBVAD leverages large vision language models as the knowledge engine to generate video descriptions, which are then utilized as semantic supervision signals to optimize visual features. The proposed DBVAD comprises several key components. First, the key event selection strategy is used to accurately select key frames from videos for subsequent description generation. Second, the temporal modeling module captures the multi-scale temporal dependencies within videos. Lastly, the semantic focus prompt calibrates visual representations using label texts, while the description boosted module achieves fine alignment between visual features and generated description text through contrastive learning, thereby enhancing the model’s semantic understanding of abnormal events. Experimental results indicate that DBVAD achieves superior performance on the large-scale UCF-Crime and XD-Violence datasets, thereby validating its effectiveness.

源语言英语
主期刊名Pattern Recognition and Computer Vision - 8th Chinese Conference, PRCV 2025, Proceedings
编辑Josef Kittler, Hongkai Xiong, Weiyao Lin, Jian Yang, Xilin Chen, Jiwen Lu, Jingyi Yu, Weishi Zheng
出版商Springer Science and Business Media Deutschland GmbH
358-372
页数15
ISBN(印刷版)9789819555666
DOI
出版状态已出版 - 2026
活动8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025 - Shanghai, 中国
期限: 15 10月 202518 10月 2025

出版系列

姓名Lecture Notes in Computer Science
16276 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025
国家/地区中国
Shanghai
时期15/10/2518/10/25

联合国可持续发展目标

此成果有助于实现下列可持续发展目标:

  1. 可持续发展目标 16 - 和平、正义和强大机构
    可持续发展目标 16 和平、正义和强大机构

指纹

探究 'Boosting Weakly Supervised Video Anomaly Detection with Generative Description' 的科研主题。它们共同构成独一无二的指纹。

引用此