UPGP:Backdoor defense via unlearning perturbation and orthogonality-constraint gradient projection

  • Jingtai Li
  • , Xiujiu Yuan
  • , Jiwei Tian
  • , Shiwei Lu
  • , Dengxiu Yu

Research output: Contribution to journalArticlepeer-review

Abstract

In the field of artificial intelligence, the wide adoption of third-party data has heightened the risk of backdoor attacks based on data poisoning. Although post-training defenses can mitigate such attacks, aggressive strategies often degrade the model's performance on its main task. To address this, a novel method that combines backdoor detection and elimination through machine unlearning is proposed. Specifically, unlearning perturbation is first defined to capture the parameter variation induced by forgetting a subset of samples. Subsequently, experiments confirm that backdoor samples exhibit lower sensitivity to perturbations generated from normal samples. In addition, a learning-dynamics analysis attributes this discrepancy to unlearning sensitivity, which is defined as the inner product between the gradients of normal and backdoor samples. This analysis further demonstrates that this metric quantifies the extent to which backdoor removal perturbs the model's main task. Leveraging this insight, an orthogonality-constrained gradient projection method projects the unlearning gradient onto the null space of the normal-sample gradient, thereby eliminating the aforementioned unlearning sensitivity and preserving the accuracy of normal samples. The proposed method is evaluated across six backdoor attack scenarios and two network architectures, reducing the average attack success rate by 96.34 percentage points and improving robust accuracy by 83.68 percentage points, while maintaining the model's performance on the main task.

Original languageEnglish
Article number113211
JournalPattern Recognition
Volume176
DOIs
StatePublished - Aug 2026

Keywords

  • Backdoor defense
  • Data security
  • Gradient projection
  • Learning dynamic
  • Machine unlearning

Fingerprint

Dive into the research topics of 'UPGP:Backdoor defense via unlearning perturbation and orthogonality-constraint gradient projection'. Together they form a unique fingerprint.

Cite this