Context Recovery and Knowledge Retrieval: A Novel Two-Stream Framework for Video Anomaly Detection

Congqi Cao; Yue Lu; Yanning Zhang

doi:10.1109/TIP.2024.3372466

Context Recovery and Knowledge Retrieval: A Novel Two-Stream Framework for Video Anomaly Detection

Congqi Cao, Yue Lu, Yanning Zhang

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

Video anomaly detection aims to find the events in a video that do not conform to the expected behavior. The prevalent methods mainly detect anomalies by snippet reconstruction or future frame prediction error. However, the error is highly dependent on the local context of the current snippet and lacks the understanding of normality. To address this issue, we propose to detect anomalous events not only by the local context, but also according to the consistency between the testing event and the knowledge about normality from the training data. Concretely, we propose a novel two-stream framework based on context recovery and knowledge retrieval, where the two streams can complement each other. For the context recovery stream, we propose a spatiotemporal U-Net which can fully utilize the motion information to predict the future frame. Furthermore, we propose a maximum local error mechanism to alleviate the problem of large recovery errors caused by complex foreground objects. For the knowledge retrieval stream, we propose an improved learnable locality-sensitive hashing, which optimizes hash functions via a Siamese network and a mutual difference loss. The knowledge about normality is encoded and stored in hash tables, and the distance between the testing event and the knowledge representation is used to reveal the probability of anomaly. Finally, we fuse the anomaly scores from the two streams to detect anomalies. Extensive experiments demonstrate the effectiveness and complementarity of the two streams, whereby the proposed two-stream framework achieves state-of-the-art performance on ShanghaiTech, Avenue and Corridor datasets among the methods without object detection. Even if compared with the methods using object detection, our method reaches competitive or better performance on the ShanghaiTech, Avenue, and Ped2 datasets.

Original language	English
Pages (from-to)	1810-1825
Number of pages	16
Journal	IEEE Transactions on Image Processing
Volume	33
DOIs	https://doi.org/10.1109/TIP.2024.3372466
State	Published - 2024

Keywords

Video anomaly detection
context recovery
knowledge retrieval
two-stream framework

Access to Document

10.1109/TIP.2024.3372466

Cite this

@article{5b48ff2fe01a40a3a3c0e015a5621c9d,

title = "Context Recovery and Knowledge Retrieval: A Novel Two-Stream Framework for Video Anomaly Detection",

abstract = "Video anomaly detection aims to find the events in a video that do not conform to the expected behavior. The prevalent methods mainly detect anomalies by snippet reconstruction or future frame prediction error. However, the error is highly dependent on the local context of the current snippet and lacks the understanding of normality. To address this issue, we propose to detect anomalous events not only by the local context, but also according to the consistency between the testing event and the knowledge about normality from the training data. Concretely, we propose a novel two-stream framework based on context recovery and knowledge retrieval, where the two streams can complement each other. For the context recovery stream, we propose a spatiotemporal U-Net which can fully utilize the motion information to predict the future frame. Furthermore, we propose a maximum local error mechanism to alleviate the problem of large recovery errors caused by complex foreground objects. For the knowledge retrieval stream, we propose an improved learnable locality-sensitive hashing, which optimizes hash functions via a Siamese network and a mutual difference loss. The knowledge about normality is encoded and stored in hash tables, and the distance between the testing event and the knowledge representation is used to reveal the probability of anomaly. Finally, we fuse the anomaly scores from the two streams to detect anomalies. Extensive experiments demonstrate the effectiveness and complementarity of the two streams, whereby the proposed two-stream framework achieves state-of-the-art performance on ShanghaiTech, Avenue and Corridor datasets among the methods without object detection. Even if compared with the methods using object detection, our method reaches competitive or better performance on the ShanghaiTech, Avenue, and Ped2 datasets.",

keywords = "Video anomaly detection, context recovery, knowledge retrieval, two-stream framework",

author = "Congqi Cao and Yue Lu and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE.",

year = "2024",

doi = "10.1109/TIP.2024.3372466",

language = "英语",

volume = "33",

pages = "1810--1825",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Context Recovery and Knowledge Retrieval

T2 - A Novel Two-Stream Framework for Video Anomaly Detection

AU - Cao, Congqi

AU - Lu, Yue

AU - Zhang, Yanning

PY - 2024

Y1 - 2024

N2 - Video anomaly detection aims to find the events in a video that do not conform to the expected behavior. The prevalent methods mainly detect anomalies by snippet reconstruction or future frame prediction error. However, the error is highly dependent on the local context of the current snippet and lacks the understanding of normality. To address this issue, we propose to detect anomalous events not only by the local context, but also according to the consistency between the testing event and the knowledge about normality from the training data. Concretely, we propose a novel two-stream framework based on context recovery and knowledge retrieval, where the two streams can complement each other. For the context recovery stream, we propose a spatiotemporal U-Net which can fully utilize the motion information to predict the future frame. Furthermore, we propose a maximum local error mechanism to alleviate the problem of large recovery errors caused by complex foreground objects. For the knowledge retrieval stream, we propose an improved learnable locality-sensitive hashing, which optimizes hash functions via a Siamese network and a mutual difference loss. The knowledge about normality is encoded and stored in hash tables, and the distance between the testing event and the knowledge representation is used to reveal the probability of anomaly. Finally, we fuse the anomaly scores from the two streams to detect anomalies. Extensive experiments demonstrate the effectiveness and complementarity of the two streams, whereby the proposed two-stream framework achieves state-of-the-art performance on ShanghaiTech, Avenue and Corridor datasets among the methods without object detection. Even if compared with the methods using object detection, our method reaches competitive or better performance on the ShanghaiTech, Avenue, and Ped2 datasets.

AB - Video anomaly detection aims to find the events in a video that do not conform to the expected behavior. The prevalent methods mainly detect anomalies by snippet reconstruction or future frame prediction error. However, the error is highly dependent on the local context of the current snippet and lacks the understanding of normality. To address this issue, we propose to detect anomalous events not only by the local context, but also according to the consistency between the testing event and the knowledge about normality from the training data. Concretely, we propose a novel two-stream framework based on context recovery and knowledge retrieval, where the two streams can complement each other. For the context recovery stream, we propose a spatiotemporal U-Net which can fully utilize the motion information to predict the future frame. Furthermore, we propose a maximum local error mechanism to alleviate the problem of large recovery errors caused by complex foreground objects. For the knowledge retrieval stream, we propose an improved learnable locality-sensitive hashing, which optimizes hash functions via a Siamese network and a mutual difference loss. The knowledge about normality is encoded and stored in hash tables, and the distance between the testing event and the knowledge representation is used to reveal the probability of anomaly. Finally, we fuse the anomaly scores from the two streams to detect anomalies. Extensive experiments demonstrate the effectiveness and complementarity of the two streams, whereby the proposed two-stream framework achieves state-of-the-art performance on ShanghaiTech, Avenue and Corridor datasets among the methods without object detection. Even if compared with the methods using object detection, our method reaches competitive or better performance on the ShanghaiTech, Avenue, and Ped2 datasets.

KW - Video anomaly detection

KW - context recovery

KW - knowledge retrieval

KW - two-stream framework

UR - http://www.scopus.com/inward/record.url?scp=85187399677&partnerID=8YFLogxK

U2 - 10.1109/TIP.2024.3372466

DO - 10.1109/TIP.2024.3372466

M3 - 文章

C2 - 38451764

AN - SCOPUS:85187399677

SN - 1057-7149

VL - 33

SP - 1810

EP - 1825

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

ER -

Context Recovery and Knowledge Retrieval: A Novel Two-Stream Framework for Video Anomaly Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this