Semantic-aware knowledge distillation with parameter-free feature uniformization

Guangyu Guo; Longfei Han; Le Wang; Dingwen Zhang; Junwei Han

doi:10.1007/s44267-023-00003-0

Semantic-aware knowledge distillation with parameter-free feature uniformization

Guangyu Guo, Longfei Han, Le Wang, Dingwen Zhang, Junwei Han

School of Automation

Research output: Contribution to journal › Article › peer-review

28 Scopus citations

Abstract

Knowledge distillation aims to distill knowledge from teacher networks to train student networks. Distilling intermediate features has attracted much attention in recent years as it can be flexibly applied in various fields such as image classification, object detection and semantic segmentation. A critical obstacle of feature-based knowledge distillation is the dimension gap between the intermediate features of teacher and student, and plenty of methods have been proposed to resolve this problem. However, these works usually implement feature uniformization in an unsupervised way, lacking guidance to help the student network learn meaningful mapping functions in the uniformization process. Moreover, the dimension uniformization process of the student and teacher network is usually not equivalent as the mapping functions are different. To this end, some factors of the feature are discarded during parametric feature alignment, or some factors are blended in some non-parametric operations. In this paper, we propose a novel semantic-aware knowledge distillation scheme to solve these problems. We build a standalone feature-based classification branch to extract semantic-aware knowledge for better guiding the learning process of the student network. In addition, to avoid the inequivalence of feature uniformization between teacher and student, we design a novel parameter-free self-attention operation that can convert features of different dimensions into vectors of the same length. Experimental results show that the proposed knowledge distillation scheme outperforms existing feature-based distillation methods on the widely used CIFAR-100 and CINIC-10 datasets.

Original language	English
Article number	6
Journal	Visual Intelligence
Volume	1
Issue number	1
DOIs	https://doi.org/10.1007/s44267-023-00003-0
State	Published - Dec 2023

Keywords

Feature uniformization
Feature-based
Knowledge distillation
Parameter-free
Self-attention
Semantic-aware

Access to Document

10.1007/s44267-023-00003-0

Cite this

@article{b54528cf5a054495915691b53eac611e,

title = "Semantic-aware knowledge distillation with parameter-free feature uniformization",

abstract = "Knowledge distillation aims to distill knowledge from teacher networks to train student networks. Distilling intermediate features has attracted much attention in recent years as it can be flexibly applied in various fields such as image classification, object detection and semantic segmentation. A critical obstacle of feature-based knowledge distillation is the dimension gap between the intermediate features of teacher and student, and plenty of methods have been proposed to resolve this problem. However, these works usually implement feature uniformization in an unsupervised way, lacking guidance to help the student network learn meaningful mapping functions in the uniformization process. Moreover, the dimension uniformization process of the student and teacher network is usually not equivalent as the mapping functions are different. To this end, some factors of the feature are discarded during parametric feature alignment, or some factors are blended in some non-parametric operations. In this paper, we propose a novel semantic-aware knowledge distillation scheme to solve these problems. We build a standalone feature-based classification branch to extract semantic-aware knowledge for better guiding the learning process of the student network. In addition, to avoid the inequivalence of feature uniformization between teacher and student, we design a novel parameter-free self-attention operation that can convert features of different dimensions into vectors of the same length. Experimental results show that the proposed knowledge distillation scheme outperforms existing feature-based distillation methods on the widely used CIFAR-100 and CINIC-10 datasets.",

keywords = "Feature uniformization, Feature-based, Knowledge distillation, Parameter-free, Self-attention, Semantic-aware",

author = "Guangyu Guo and Longfei Han and Le Wang and Dingwen Zhang and Junwei Han",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2023.",

year = "2023",

month = dec,

doi = "10.1007/s44267-023-00003-0",

language = "英语",

volume = "1",

journal = "Visual Intelligence",

issn = "2097-3330",

publisher = "Springer",

number = "1",

}

TY - JOUR

T1 - Semantic-aware knowledge distillation with parameter-free feature uniformization

AU - Guo, Guangyu

AU - Han, Longfei

AU - Wang, Le

AU - Zhang, Dingwen

AU - Han, Junwei

N1 - Publisher Copyright: © The Author(s) 2023.

PY - 2023/12

Y1 - 2023/12

N2 - Knowledge distillation aims to distill knowledge from teacher networks to train student networks. Distilling intermediate features has attracted much attention in recent years as it can be flexibly applied in various fields such as image classification, object detection and semantic segmentation. A critical obstacle of feature-based knowledge distillation is the dimension gap between the intermediate features of teacher and student, and plenty of methods have been proposed to resolve this problem. However, these works usually implement feature uniformization in an unsupervised way, lacking guidance to help the student network learn meaningful mapping functions in the uniformization process. Moreover, the dimension uniformization process of the student and teacher network is usually not equivalent as the mapping functions are different. To this end, some factors of the feature are discarded during parametric feature alignment, or some factors are blended in some non-parametric operations. In this paper, we propose a novel semantic-aware knowledge distillation scheme to solve these problems. We build a standalone feature-based classification branch to extract semantic-aware knowledge for better guiding the learning process of the student network. In addition, to avoid the inequivalence of feature uniformization between teacher and student, we design a novel parameter-free self-attention operation that can convert features of different dimensions into vectors of the same length. Experimental results show that the proposed knowledge distillation scheme outperforms existing feature-based distillation methods on the widely used CIFAR-100 and CINIC-10 datasets.

AB - Knowledge distillation aims to distill knowledge from teacher networks to train student networks. Distilling intermediate features has attracted much attention in recent years as it can be flexibly applied in various fields such as image classification, object detection and semantic segmentation. A critical obstacle of feature-based knowledge distillation is the dimension gap between the intermediate features of teacher and student, and plenty of methods have been proposed to resolve this problem. However, these works usually implement feature uniformization in an unsupervised way, lacking guidance to help the student network learn meaningful mapping functions in the uniformization process. Moreover, the dimension uniformization process of the student and teacher network is usually not equivalent as the mapping functions are different. To this end, some factors of the feature are discarded during parametric feature alignment, or some factors are blended in some non-parametric operations. In this paper, we propose a novel semantic-aware knowledge distillation scheme to solve these problems. We build a standalone feature-based classification branch to extract semantic-aware knowledge for better guiding the learning process of the student network. In addition, to avoid the inequivalence of feature uniformization between teacher and student, we design a novel parameter-free self-attention operation that can convert features of different dimensions into vectors of the same length. Experimental results show that the proposed knowledge distillation scheme outperforms existing feature-based distillation methods on the widely used CIFAR-100 and CINIC-10 datasets.

KW - Feature uniformization

KW - Feature-based

KW - Knowledge distillation

KW - Parameter-free

KW - Self-attention

KW - Semantic-aware

UR - http://www.scopus.com/inward/record.url?scp=85180555289&partnerID=8YFLogxK

U2 - 10.1007/s44267-023-00003-0

DO - 10.1007/s44267-023-00003-0

M3 - 文章

AN - SCOPUS:85180555289

SN - 2097-3330

VL - 1

JO - Visual Intelligence

JF - Visual Intelligence

IS - 1

M1 - 6

ER -

Semantic-aware knowledge distillation with parameter-free feature uniformization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this