Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation

Congqi Cao; Hanwen Zhang; Yue Lu; Peng Wang; Yanning Zhang

doi:10.1109/TPAMI.2024.3461718

Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation

Congqi Cao, Hanwen Zhang, Yue Lu, Peng Wang, Yanning Zhang

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.

Original language	English
Pages (from-to)	224-239
Number of pages	16
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	47
Issue number	1
DOIs	https://doi.org/10.1109/TPAMI.2024.3461718
State	Published - 2025

Keywords

Scene-dependent anomaly
diffusion models
prediction network
video anomaly detection and anticipation

Access to Document

10.1109/TPAMI.2024.3461718

Cite this

@article{b2c8e15c94a84cf39c18504f3beccd71,

title = "Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation",

abstract = "Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.",

keywords = "Scene-dependent anomaly, diffusion models, prediction network, video anomaly detection and anticipation",

author = "Congqi Cao and Hanwen Zhang and Yue Lu and Peng Wang and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2025",

doi = "10.1109/TPAMI.2024.3461718",

language = "英语",

volume = "47",

pages = "224--239",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "1",

}

TY - JOUR

T1 - Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation

AU - Cao, Congqi

AU - Zhang, Hanwen

AU - Lu, Yue

AU - Wang, Peng

AU - Zhang, Yanning

PY - 2025

Y1 - 2025

N2 - Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.

AB - Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.

KW - Scene-dependent anomaly

KW - diffusion models

KW - prediction network

KW - video anomaly detection and anticipation

UR - http://www.scopus.com/inward/record.url?scp=85204453216&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2024.3461718

DO - 10.1109/TPAMI.2024.3461718

M3 - 文章

AN - SCOPUS:85204453216

SN - 0162-8828

VL - 47

SP - 224

EP - 239

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 1

ER -

Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this