TY - JOUR
T1 - 3D differential decomposition for video deepfake detection with identity suppression
AU - Gao, Jie
AU - Micheletto, Marco
AU - Orrù, Giulia
AU - Feng, Xiaoyi
AU - Marcialis, Gian Luca
N1 - Publisher Copyright:
© 2026 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license. http://creativecommons.org/licenses/by/4.0/
PY - 2026/5
Y1 - 2026/5
N2 - Detecting deepfake videos remains a challenging task, especially in scenarios involving unknown manipulation methods or unseen data distributions. Most existing video deepfake detection methods rely on high-level semantic features, which often lead to overfitting of facial identity information and poor transferability. In this work, we explore a novel perspective by modeling videos through 3D differential operations along temporal and spatial dimensions. To exploit the spatial–temporal variation information of the video content, the proposed approach decomposes videos into single-axis 1D differential signals, which are then transformed into 2D representations for efficient learning. This procedure enables the use of lightweight 2D CNNs while retaining directional forgery cues. Our experiments, aimed at analyzing whether these differential signals capture discriminative patterns useful for distinguishing real from fake content, show that the proposed method achieves strong intra-dataset performance and reveals complementary information across dimensions. These findings suggest that differential signals could potentially support generalization when integrated into broader detection frameworks.
AB - Detecting deepfake videos remains a challenging task, especially in scenarios involving unknown manipulation methods or unseen data distributions. Most existing video deepfake detection methods rely on high-level semantic features, which often lead to overfitting of facial identity information and poor transferability. In this work, we explore a novel perspective by modeling videos through 3D differential operations along temporal and spatial dimensions. To exploit the spatial–temporal variation information of the video content, the proposed approach decomposes videos into single-axis 1D differential signals, which are then transformed into 2D representations for efficient learning. This procedure enables the use of lightweight 2D CNNs while retaining directional forgery cues. Our experiments, aimed at analyzing whether these differential signals capture discriminative patterns useful for distinguishing real from fake content, show that the proposed method achieves strong intra-dataset performance and reveals complementary information across dimensions. These findings suggest that differential signals could potentially support generalization when integrated into broader detection frameworks.
KW - 3D differential modeling
KW - Identity suppression
KW - Video deepfake detection
UR - https://www.scopus.com/pages/publications/105032723382
U2 - 10.1016/j.image.2026.117525
DO - 10.1016/j.image.2026.117525
M3 - 文章
AN - SCOPUS:105032723382
SN - 0923-5965
VL - 144
JO - Signal Processing: Image Communication
JF - Signal Processing: Image Communication
M1 - 117525
ER -