A self-training algorithm based on the two-stage data editing method with mass-based dissimilarity

Jikui Wang; Yiwen Wu; Shaobo Li; Feiping Nie

doi:10.1016/j.neunet.2023.09.046

A self-training algorithm based on the two-stage data editing method with mass-based dissimilarity

Jikui Wang, Yiwen Wu, Shaobo Li, Feiping Nie

School of Artificial Intelligence, OPtics and Electronics

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

A self-training algorithm is a classical semi-supervised learning algorithm that uses a small number of labeled samples and a large number of unlabeled samples to train a classifier. However, the existing self-training algorithms consider only the geometric distance between data while ignoring the data distribution when calculating the similarity between samples. In addition, misclassified samples can severely affect the performance of a self-training algorithm. To address the above two problems, this paper proposes a self-training algorithm based on data editing with mass-based dissimilarity (STDEMB). First, the mass matrix with the mass-based dissimilarity is obtained, and then the mass-based local density of each sample is determined based on its k nearest neighbors. Inspired by density peak clustering (DPC), this study designs a prototype tree based on the prototype concept. In addition, an efficient two-stage data editing algorithm is developed to edit misclassified samples and efficiently select high-confidence samples during the self-training process. The proposed STDEMB algorithm is verified by experiments using accuracy and F-score as evaluation metrics. The experimental results on 18 benchmark datasets demonstrate the effectiveness of the proposed STDEMB algorithm.

Original language	English
Pages (from-to)	431-449
Number of pages	19
Journal	Neural Networks
Volume	168
DOIs	https://doi.org/10.1016/j.neunet.2023.09.046
State	Published - Nov 2023

Keywords

Data editing
Mass-based dissimilarity
Relative node set
Self-training algorithm

Access to Document

10.1016/j.neunet.2023.09.046

Cite this

@article{a9b9e82f57f745a98405b762a3e93db2,

title = "A self-training algorithm based on the two-stage data editing method with mass-based dissimilarity",

abstract = "A self-training algorithm is a classical semi-supervised learning algorithm that uses a small number of labeled samples and a large number of unlabeled samples to train a classifier. However, the existing self-training algorithms consider only the geometric distance between data while ignoring the data distribution when calculating the similarity between samples. In addition, misclassified samples can severely affect the performance of a self-training algorithm. To address the above two problems, this paper proposes a self-training algorithm based on data editing with mass-based dissimilarity (STDEMB). First, the mass matrix with the mass-based dissimilarity is obtained, and then the mass-based local density of each sample is determined based on its k nearest neighbors. Inspired by density peak clustering (DPC), this study designs a prototype tree based on the prototype concept. In addition, an efficient two-stage data editing algorithm is developed to edit misclassified samples and efficiently select high-confidence samples during the self-training process. The proposed STDEMB algorithm is verified by experiments using accuracy and F-score as evaluation metrics. The experimental results on 18 benchmark datasets demonstrate the effectiveness of the proposed STDEMB algorithm.",

keywords = "Data editing, Mass-based dissimilarity, Relative node set, Self-training algorithm",

author = "Jikui Wang and Yiwen Wu and Shaobo Li and Feiping Nie",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier Ltd",

year = "2023",

month = nov,

doi = "10.1016/j.neunet.2023.09.046",

language = "英语",

volume = "168",

pages = "431--449",

journal = "Neural Networks",

issn = "0893-6080",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - A self-training algorithm based on the two-stage data editing method with mass-based dissimilarity

AU - Wang, Jikui

AU - Wu, Yiwen

AU - Li, Shaobo

AU - Nie, Feiping

PY - 2023/11

Y1 - 2023/11

N2 - A self-training algorithm is a classical semi-supervised learning algorithm that uses a small number of labeled samples and a large number of unlabeled samples to train a classifier. However, the existing self-training algorithms consider only the geometric distance between data while ignoring the data distribution when calculating the similarity between samples. In addition, misclassified samples can severely affect the performance of a self-training algorithm. To address the above two problems, this paper proposes a self-training algorithm based on data editing with mass-based dissimilarity (STDEMB). First, the mass matrix with the mass-based dissimilarity is obtained, and then the mass-based local density of each sample is determined based on its k nearest neighbors. Inspired by density peak clustering (DPC), this study designs a prototype tree based on the prototype concept. In addition, an efficient two-stage data editing algorithm is developed to edit misclassified samples and efficiently select high-confidence samples during the self-training process. The proposed STDEMB algorithm is verified by experiments using accuracy and F-score as evaluation metrics. The experimental results on 18 benchmark datasets demonstrate the effectiveness of the proposed STDEMB algorithm.

AB - A self-training algorithm is a classical semi-supervised learning algorithm that uses a small number of labeled samples and a large number of unlabeled samples to train a classifier. However, the existing self-training algorithms consider only the geometric distance between data while ignoring the data distribution when calculating the similarity between samples. In addition, misclassified samples can severely affect the performance of a self-training algorithm. To address the above two problems, this paper proposes a self-training algorithm based on data editing with mass-based dissimilarity (STDEMB). First, the mass matrix with the mass-based dissimilarity is obtained, and then the mass-based local density of each sample is determined based on its k nearest neighbors. Inspired by density peak clustering (DPC), this study designs a prototype tree based on the prototype concept. In addition, an efficient two-stage data editing algorithm is developed to edit misclassified samples and efficiently select high-confidence samples during the self-training process. The proposed STDEMB algorithm is verified by experiments using accuracy and F-score as evaluation metrics. The experimental results on 18 benchmark datasets demonstrate the effectiveness of the proposed STDEMB algorithm.

KW - Data editing

KW - Mass-based dissimilarity

KW - Relative node set

KW - Self-training algorithm

UR - http://www.scopus.com/inward/record.url?scp=85174703517&partnerID=8YFLogxK

U2 - 10.1016/j.neunet.2023.09.046

DO - 10.1016/j.neunet.2023.09.046

M3 - 文章

C2 - 37804746

AN - SCOPUS:85174703517

SN - 0893-6080

VL - 168

SP - 431

EP - 449

JO - Neural Networks

JF - Neural Networks

ER -

A self-training algorithm based on the two-stage data editing method with mass-based dissimilarity

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this