Relaxed Knowledge Distillation

Research output: Contribution to journalArticlepeer-review

Abstract

Knowledge distillation, aiming to improve a compact student model using supervision from another cumbersome teacher model, has been a quite prevalent technique for model compression on various computer vision tasks. Existing methods mainly adopt a one-to-one knowledge transfer, where the student model will be forced to achieve a specific result provided by the teacher model. However, the performance of this training paradigm will deteriorate as the model capacity gap expands, since high-level teacher knowledge is too abstract and difficult to understand for the student models with low capacity. Based on this, we propose a novel feature-based Knowledge distillation framework dubbed ReKD, which can provide the student model with multiple choices in feature distillation, thereby relaxing the alignment process in knowledge transfer. Specifically, we transform the teacher features into latent variables through variational inference, with the posterior following Gaussian distribution. It renders the feature knowledge into a region instead of a specific point in the distillation space, which enables the student features to select suitable distillation targets from learned distribution adaptively. Furthermore, to ensure the high quality of latent variables, we utilize the student features as prior to reversely regularize the posterior inspired by mutual learning. Experimental results on three typical visual recognition datasets i.e., CIFAR-100, ImageNet-1K, and MS-COCO, have significantly demonstrated the superiority of our proposed method.

Original languageEnglish
Article number101
JournalInternational Journal of Computer Vision
Volume134
Issue number3
DOIs
StatePublished - Mar 2026

Keywords

  • Feature-based knowledge distillation
  • Model compression
  • Mutual learning
  • Variational inference

Fingerprint

Dive into the research topics of 'Relaxed Knowledge Distillation'. Together they form a unique fingerprint.

Cite this