Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection

Yunzhi Ling; Feiping Nie; Weizhong Yu; Xuelong Li

doi:10.1109/TKDE.2025.3561046

Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection

Yunzhi Ling, Feiping Nie, Weizhong Yu, Xuelong Li

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

This paper proposes a deep pseudo-label method for unsupervised feature selection, which learns non-linear representations to generate pseudo-labels and trains a Neural Network (NN) to select informative features via self-Knowledge Distillation (KD). Specifically, the proposed method divides a standard NN into two sub-components: an encoder and a predictor, and introduces a dependency subnet. It works by self-supervised pretraining the encoder to produce informative representations and then alternating between two steps: (1) learning pseudo-labels by combining the clustering results of the encoder's outputs with the NN's prediction outputs, and (2) updating the NN's parameters by globally selecting a subset of features to predict the pseudo-labels while updating the subnet's parameters through self-KD. Self-KD is achieved by encouraging the subnet to locally capture a subset of the NN features to produce class probabilities that match those produced by the NN. This allows the model to self-absorb the learned inter-class knowledge and evaluate feature diversity, removing redundant features without sacrificing performance. Meanwhile, the potential discriminative capability of a NN can also be self-excavated without the assistance of other NNs. The two alternate steps reinforce each other: in step (2), by predicting the learned pseudo-labels and conducting selfKD, the discrimination of the outputs of both the NN and the encoder is gradually enhanced, while the self-labeling method in step (1) leverages these two improvements to further refine the pseudo-labels for step (2), resulting in the superior performance. Extensive experiments show the proposed method significantly outperforms state-of-the-art methods across various datasets.

源语言	英语
期刊	IEEE Transactions on Knowledge and Data Engineering
DOI	https://doi.org/10.1109/TKDE.2025.3561046
出版状态	已接受/待刊 - 2025

访问文件

10.1109/TKDE.2025.3561046

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{9d95e913c7304df9b30cb7e166a15c60,

title = "Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection",

abstract = "This paper proposes a deep pseudo-label method for unsupervised feature selection, which learns non-linear representations to generate pseudo-labels and trains a Neural Network (NN) to select informative features via self-Knowledge Distillation (KD). Specifically, the proposed method divides a standard NN into two sub-components: an encoder and a predictor, and introduces a dependency subnet. It works by self-supervised pretraining the encoder to produce informative representations and then alternating between two steps: (1) learning pseudo-labels by combining the clustering results of the encoder's outputs with the NN's prediction outputs, and (2) updating the NN's parameters by globally selecting a subset of features to predict the pseudo-labels while updating the subnet's parameters through self-KD. Self-KD is achieved by encouraging the subnet to locally capture a subset of the NN features to produce class probabilities that match those produced by the NN. This allows the model to self-absorb the learned inter-class knowledge and evaluate feature diversity, removing redundant features without sacrificing performance. Meanwhile, the potential discriminative capability of a NN can also be self-excavated without the assistance of other NNs. The two alternate steps reinforce each other: in step (2), by predicting the learned pseudo-labels and conducting selfKD, the discrimination of the outputs of both the NN and the encoder is gradually enhanced, while the self-labeling method in step (1) leverages these two improvements to further refine the pseudo-labels for step (2), resulting in the superior performance. Extensive experiments show the proposed method significantly outperforms state-of-the-art methods across various datasets.",

keywords = "knowledge distillation, neural network, pseudo-labels, Unsupervised feature selection, unsupervised learning",

author = "Yunzhi Ling and Feiping Nie and Weizhong Yu and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 1989-2012 IEEE.",

year = "2025",

doi = "10.1109/TKDE.2025.3561046",

language = "英语",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection

AU - Ling, Yunzhi

AU - Nie, Feiping

AU - Yu, Weizhong

AU - Li, Xuelong

PY - 2025

Y1 - 2025

N2 - This paper proposes a deep pseudo-label method for unsupervised feature selection, which learns non-linear representations to generate pseudo-labels and trains a Neural Network (NN) to select informative features via self-Knowledge Distillation (KD). Specifically, the proposed method divides a standard NN into two sub-components: an encoder and a predictor, and introduces a dependency subnet. It works by self-supervised pretraining the encoder to produce informative representations and then alternating between two steps: (1) learning pseudo-labels by combining the clustering results of the encoder's outputs with the NN's prediction outputs, and (2) updating the NN's parameters by globally selecting a subset of features to predict the pseudo-labels while updating the subnet's parameters through self-KD. Self-KD is achieved by encouraging the subnet to locally capture a subset of the NN features to produce class probabilities that match those produced by the NN. This allows the model to self-absorb the learned inter-class knowledge and evaluate feature diversity, removing redundant features without sacrificing performance. Meanwhile, the potential discriminative capability of a NN can also be self-excavated without the assistance of other NNs. The two alternate steps reinforce each other: in step (2), by predicting the learned pseudo-labels and conducting selfKD, the discrimination of the outputs of both the NN and the encoder is gradually enhanced, while the self-labeling method in step (1) leverages these two improvements to further refine the pseudo-labels for step (2), resulting in the superior performance. Extensive experiments show the proposed method significantly outperforms state-of-the-art methods across various datasets.

AB - This paper proposes a deep pseudo-label method for unsupervised feature selection, which learns non-linear representations to generate pseudo-labels and trains a Neural Network (NN) to select informative features via self-Knowledge Distillation (KD). Specifically, the proposed method divides a standard NN into two sub-components: an encoder and a predictor, and introduces a dependency subnet. It works by self-supervised pretraining the encoder to produce informative representations and then alternating between two steps: (1) learning pseudo-labels by combining the clustering results of the encoder's outputs with the NN's prediction outputs, and (2) updating the NN's parameters by globally selecting a subset of features to predict the pseudo-labels while updating the subnet's parameters through self-KD. Self-KD is achieved by encouraging the subnet to locally capture a subset of the NN features to produce class probabilities that match those produced by the NN. This allows the model to self-absorb the learned inter-class knowledge and evaluate feature diversity, removing redundant features without sacrificing performance. Meanwhile, the potential discriminative capability of a NN can also be self-excavated without the assistance of other NNs. The two alternate steps reinforce each other: in step (2), by predicting the learned pseudo-labels and conducting selfKD, the discrimination of the outputs of both the NN and the encoder is gradually enhanced, while the self-labeling method in step (1) leverages these two improvements to further refine the pseudo-labels for step (2), resulting in the superior performance. Extensive experiments show the proposed method significantly outperforms state-of-the-art methods across various datasets.

KW - knowledge distillation

KW - neural network

KW - pseudo-labels

KW - Unsupervised feature selection

KW - unsupervised learning

UR - http://www.scopus.com/inward/record.url?scp=105003495881&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2025.3561046

DO - 10.1109/TKDE.2025.3561046

M3 - 文章

AN - SCOPUS:105003495881

SN - 1041-4347

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

ER -

Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection

摘要

访问文件

其它文件与链接

指纹

引用此