Semi-supervised Self-Training Algorithm for Density Peak Membership Optimization

Xuewen Liu; Jikui Wang; Zhengguo Yang; Bing Li; Feiping Nie

doi:10.3778/j.issn.1673-9418.2102018

Semi-supervised Self-Training Algorithm for Density Peak Membership Optimization

Xuewen Liu, Jikui Wang, Zhengguo Yang, Bing Li, Feiping Nie

School of Artificial Intelligence, OPtics and Electronics

Lanzhou University of Finance and Economics

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Most of data contain only a few labels because of high cost of obtaining them in reality. Compared with supervised learning and unsupervised learning, semi-supervised learning can obtain higher learning performance with less labeling cost by making full use of large amount of unlabeled data and small amount of labeled data in datasets. Self-Training algorithm is a classical semi-supervised learning algorithm. In the process of iteratively optimizing classifier, high-confidence samples are continuously selected from unlabeled samples and labeled by the base classifier. Then, these samples and pseudo-labels will be added into the training sets. Selecting high-confidence samples is a critical step in the Self-Training algorithm. Inspired by the density peaks clustering (DPC) algorithm, this paper proposes semi-supervised Self-Training algorithm for density peak membership optimization (STDPM), which uses den sity peak to select high-confidence samples. Firstly, STDPM takes density peak to discover the potential spatial structure information of the samples and constructs a prototype tree. Secondly, STDPM searches the unlabeled direct relatives of the labeled samples in the prototype tree, and defines the density peak of the unlabeled direct relatives that belong to different clusters as the clusters-peak. Then, clusters-peak is turned into the density peak membership after normalized. Finally, STDPM regards samples with membership greater than the set threshold as high-confidence samples that are labeled by the base classifier and added to the training set. STDPM makes full use of the density and distance information implied by the peak, which improves the selection quality of high-confidence samples and further improves the classification performance. Comparative experiments are conducted on 8 benchmark datasets, which verify the effectiveness of STDPM.

Original language	English
Pages (from-to)	2078-2088
Number of pages	11
Journal	Journal of Frontiers of Computer Science and Technology
Volume	16
Issue number	9
DOIs	https://doi.org/10.3778/j.issn.1673-9418.2102018
State	Published - 1 Sep 2022

Keywords

clusters-peak
density peak membership
direct relative node sets
prototype tree
self-training

Access to Document

10.3778/j.issn.1673-9418.2102018

Cite this

@article{6a96a17d97b640b78381544fb8d99507,

title = "Semi-supervised Self-Training Algorithm for Density Peak Membership Optimization",

abstract = "Most of data contain only a few labels because of high cost of obtaining them in reality. Compared with supervised learning and unsupervised learning, semi-supervised learning can obtain higher learning performance with less labeling cost by making full use of large amount of unlabeled data and small amount of labeled data in datasets. Self-Training algorithm is a classical semi-supervised learning algorithm. In the process of iteratively optimizing classifier, high-confidence samples are continuously selected from unlabeled samples and labeled by the base classifier. Then, these samples and pseudo-labels will be added into the training sets. Selecting high-confidence samples is a critical step in the Self-Training algorithm. Inspired by the density peaks clustering (DPC) algorithm, this paper proposes semi-supervised Self-Training algorithm for density peak membership optimization (STDPM), which uses den sity peak to select high-confidence samples. Firstly, STDPM takes density peak to discover the potential spatial structure information of the samples and constructs a prototype tree. Secondly, STDPM searches the unlabeled direct relatives of the labeled samples in the prototype tree, and defines the density peak of the unlabeled direct relatives that belong to different clusters as the clusters-peak. Then, clusters-peak is turned into the density peak membership after normalized. Finally, STDPM regards samples with membership greater than the set threshold as high-confidence samples that are labeled by the base classifier and added to the training set. STDPM makes full use of the density and distance information implied by the peak, which improves the selection quality of high-confidence samples and further improves the classification performance. Comparative experiments are conducted on 8 benchmark datasets, which verify the effectiveness of STDPM.",

keywords = "clusters-peak, density peak membership, direct relative node sets, prototype tree, self-training",

author = "Xuewen Liu and Jikui Wang and Zhengguo Yang and Bing Li and Feiping Nie",

year = "2022",

month = sep,

day = "1",

doi = "10.3778/j.issn.1673-9418.2102018",

language = "英语",

volume = "16",

pages = "2078--2088",

journal = "Journal of Frontiers of Computer Science and Technology",

issn = "1673-9418",

publisher = "Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press",

number = "9",

}

TY - JOUR

T1 - Semi-supervised Self-Training Algorithm for Density Peak Membership Optimization

AU - Liu, Xuewen

AU - Wang, Jikui

AU - Yang, Zhengguo

AU - Li, Bing

AU - Nie, Feiping

PY - 2022/9/1

Y1 - 2022/9/1

N2 - Most of data contain only a few labels because of high cost of obtaining them in reality. Compared with supervised learning and unsupervised learning, semi-supervised learning can obtain higher learning performance with less labeling cost by making full use of large amount of unlabeled data and small amount of labeled data in datasets. Self-Training algorithm is a classical semi-supervised learning algorithm. In the process of iteratively optimizing classifier, high-confidence samples are continuously selected from unlabeled samples and labeled by the base classifier. Then, these samples and pseudo-labels will be added into the training sets. Selecting high-confidence samples is a critical step in the Self-Training algorithm. Inspired by the density peaks clustering (DPC) algorithm, this paper proposes semi-supervised Self-Training algorithm for density peak membership optimization (STDPM), which uses den sity peak to select high-confidence samples. Firstly, STDPM takes density peak to discover the potential spatial structure information of the samples and constructs a prototype tree. Secondly, STDPM searches the unlabeled direct relatives of the labeled samples in the prototype tree, and defines the density peak of the unlabeled direct relatives that belong to different clusters as the clusters-peak. Then, clusters-peak is turned into the density peak membership after normalized. Finally, STDPM regards samples with membership greater than the set threshold as high-confidence samples that are labeled by the base classifier and added to the training set. STDPM makes full use of the density and distance information implied by the peak, which improves the selection quality of high-confidence samples and further improves the classification performance. Comparative experiments are conducted on 8 benchmark datasets, which verify the effectiveness of STDPM.

AB - Most of data contain only a few labels because of high cost of obtaining them in reality. Compared with supervised learning and unsupervised learning, semi-supervised learning can obtain higher learning performance with less labeling cost by making full use of large amount of unlabeled data and small amount of labeled data in datasets. Self-Training algorithm is a classical semi-supervised learning algorithm. In the process of iteratively optimizing classifier, high-confidence samples are continuously selected from unlabeled samples and labeled by the base classifier. Then, these samples and pseudo-labels will be added into the training sets. Selecting high-confidence samples is a critical step in the Self-Training algorithm. Inspired by the density peaks clustering (DPC) algorithm, this paper proposes semi-supervised Self-Training algorithm for density peak membership optimization (STDPM), which uses den sity peak to select high-confidence samples. Firstly, STDPM takes density peak to discover the potential spatial structure information of the samples and constructs a prototype tree. Secondly, STDPM searches the unlabeled direct relatives of the labeled samples in the prototype tree, and defines the density peak of the unlabeled direct relatives that belong to different clusters as the clusters-peak. Then, clusters-peak is turned into the density peak membership after normalized. Finally, STDPM regards samples with membership greater than the set threshold as high-confidence samples that are labeled by the base classifier and added to the training set. STDPM makes full use of the density and distance information implied by the peak, which improves the selection quality of high-confidence samples and further improves the classification performance. Comparative experiments are conducted on 8 benchmark datasets, which verify the effectiveness of STDPM.

KW - clusters-peak

KW - density peak membership

KW - direct relative node sets

KW - prototype tree

KW - self-training

UR - http://www.scopus.com/inward/record.url?scp=85146517299&partnerID=8YFLogxK

U2 - 10.3778/j.issn.1673-9418.2102018

DO - 10.3778/j.issn.1673-9418.2102018

M3 - 文章

AN - SCOPUS:85146517299

SN - 1673-9418

VL - 16

SP - 2078

EP - 2088

JO - Journal of Frontiers of Computer Science and Technology

JF - Journal of Frontiers of Computer Science and Technology

IS - 9

ER -

Semi-supervised Self-Training Algorithm for Density Peak Membership Optimization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this