TY - JOUR
T1 - Learnable Locality-Sensitive Hashing for Video Anomaly Detection
AU - Lu, Yue
AU - Cao, Congqi
AU - Zhang, Yifan
AU - Zhang, Yanning
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2023/2/1
Y1 - 2023/2/1
N2 - Video anomaly detection (VAD) mainly refers to identifying anomalous events that have not occurred in the training set where only normal samples are available. Existing works usually formulate VAD as a reconstruction or prediction problem. However, the adaptability and scalability of these methods are limited. In this paper, we propose a novel distance-based VAD method to take advantage of all the available normal data efficiently and flexibly. In our method, the smaller the distance between a testing sample and normal samples, the higher the probability that the testing sample is normal. Specifically, we propose to use locality-sensitive hashing (LSH) to map the samples whose similarity exceeds a certain threshold into the same bucket in advance. To utilize multiple hashes and further alleviate the computation and memory usage, we propose to use the hash codes rather than the features as the representations of the samples. In this manner, the complexity of near neighbor search is cut down significantly. To make the samples that are semantically similar get closer and those not similar get further apart, we propose a novel learnable version of LSH that embeds LSH into a neural network and optimizes the hash functions with contrastive learning strategy. The proposed method is robust to data imbalance and can handle the large intra-class variations in normal data flexibly. Besides, it has a good ability of scalability. Extensive experiments demonstrate the superiority of our method, which achieves new state-of-the-art results on VAD benchmarks.
AB - Video anomaly detection (VAD) mainly refers to identifying anomalous events that have not occurred in the training set where only normal samples are available. Existing works usually formulate VAD as a reconstruction or prediction problem. However, the adaptability and scalability of these methods are limited. In this paper, we propose a novel distance-based VAD method to take advantage of all the available normal data efficiently and flexibly. In our method, the smaller the distance between a testing sample and normal samples, the higher the probability that the testing sample is normal. Specifically, we propose to use locality-sensitive hashing (LSH) to map the samples whose similarity exceeds a certain threshold into the same bucket in advance. To utilize multiple hashes and further alleviate the computation and memory usage, we propose to use the hash codes rather than the features as the representations of the samples. In this manner, the complexity of near neighbor search is cut down significantly. To make the samples that are semantically similar get closer and those not similar get further apart, we propose a novel learnable version of LSH that embeds LSH into a neural network and optimizes the hash functions with contrastive learning strategy. The proposed method is robust to data imbalance and can handle the large intra-class variations in normal data flexibly. Besides, it has a good ability of scalability. Extensive experiments demonstrate the superiority of our method, which achieves new state-of-the-art results on VAD benchmarks.
KW - Video anomaly detection
KW - distance-based
KW - unsupervised
KW - video analysis and understanding
UR - http://www.scopus.com/inward/record.url?scp=85137920573&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2022.3205348
DO - 10.1109/TCSVT.2022.3205348
M3 - 文章
AN - SCOPUS:85137920573
SN - 1051-8215
VL - 33
SP - 963
EP - 976
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 2
ER -