Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation

Congqi Cao; Hanwen Zhang; Yue Lu; Peng Wang; Yanning Zhang

doi:10.1109/TPAMI.2024.3461718

Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation

Congqi Cao, Hanwen Zhang, Yue Lu, Peng Wang, Yanning Zhang

计算机学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.

源语言	英语
页（从-至）	224-239
页数	16
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
卷	47
期	1
DOI	https://doi.org/10.1109/TPAMI.2024.3461718
出版状态	已出版 - 2025

访问文件

10.1109/TPAMI.2024.3461718

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{b2c8e15c94a84cf39c18504f3beccd71,

title = "Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation",

abstract = "Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.",

keywords = "Scene-dependent anomaly, diffusion models, prediction network, video anomaly detection and anticipation",

author = "Congqi Cao and Hanwen Zhang and Yue Lu and Peng Wang and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2025",

doi = "10.1109/TPAMI.2024.3461718",

language = "英语",

volume = "47",

pages = "224--239",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "1",

}

TY - JOUR

T1 - Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation

AU - Cao, Congqi

AU - Zhang, Hanwen

AU - Lu, Yue

AU - Wang, Peng

AU - Zhang, Yanning

PY - 2025

Y1 - 2025

N2 - Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.

AB - Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.

KW - Scene-dependent anomaly

KW - diffusion models

KW - prediction network

KW - video anomaly detection and anticipation

UR - http://www.scopus.com/inward/record.url?scp=85204453216&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2024.3461718

DO - 10.1109/TPAMI.2024.3461718

M3 - 文章

AN - SCOPUS:85204453216

SN - 0162-8828

VL - 47

SP - 224

EP - 239

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 1

ER -

Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation

摘要

访问文件

其它文件与链接

指纹

引用此