Semantic-aware knowledge distillation with parameter-free feature uniformization

Guangyu Guo, Longfei Han, Le Wang, Dingwen Zhang, Junwei Han

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

Knowledge distillation aims to distill knowledge from teacher networks to train student networks. Distilling intermediate features has attracted much attention in recent years as it can be flexibly applied in various fields such as image classification, object detection and semantic segmentation. A critical obstacle of feature-based knowledge distillation is the dimension gap between the intermediate features of teacher and student, and plenty of methods have been proposed to resolve this problem. However, these works usually implement feature uniformization in an unsupervised way, lacking guidance to help the student network learn meaningful mapping functions in the uniformization process. Moreover, the dimension uniformization process of the student and teacher network is usually not equivalent as the mapping functions are different. To this end, some factors of the feature are discarded during parametric feature alignment, or some factors are blended in some non-parametric operations. In this paper, we propose a novel semantic-aware knowledge distillation scheme to solve these problems. We build a standalone feature-based classification branch to extract semantic-aware knowledge for better guiding the learning process of the student network. In addition, to avoid the inequivalence of feature uniformization between teacher and student, we design a novel parameter-free self-attention operation that can convert features of different dimensions into vectors of the same length. Experimental results show that the proposed knowledge distillation scheme outperforms existing feature-based distillation methods on the widely used CIFAR-100 and CINIC-10 datasets.

Original languageEnglish
Article number6
JournalVisual Intelligence
Volume1
Issue number1
DOIs
StatePublished - Dec 2023

Keywords

  • Feature uniformization
  • Feature-based
  • Knowledge distillation
  • Parameter-free
  • Self-attention
  • Semantic-aware

Fingerprint

Dive into the research topics of 'Semantic-aware knowledge distillation with parameter-free feature uniformization'. Together they form a unique fingerprint.

Cite this