Skip to main navigation Skip to search Skip to main content

Fast semi-supervised self-training algorithm based on data editing

  • Bing Li
  • , Jikui Wang
  • , Zhengguo Yang
  • , Jihai Yi
  • , Feiping Nie
  • Guizhou University
  • Lanzhou University of Finance and Economics

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

Self-training is a commonly semi-supervised learning Algorithm framework. How to select the high-confidence samples is a crucial step for algorithms based on self-training framework. To alleviate the impact of noise data, researchers have proposed many data editing methods to improve the selection quality of high-confidence samples. However, the state-of-the-art data editing methods have high time complexity, which is not less than O(n2), where n denotes the number of samples. To improve the training speed while ensuring the quality of the selected high-confidence samples, inspired by Ball-k-means algorithm, we propose a fast semi-supervised self-training Algorithm based on data editing (EBSA), which defines ball-cluster partition and editing to improve the quality of high-confidence samples. The time complexity of the proposed EBSA is O(t2kn+nlogn+n+k2), where k denotes the number of centers, t denotes the number of iterates. k is far less than n, EBSA has linear time complexity with respect to n. A large number of experiments on 20 benchmark data sets have been carried out and the experimental results show that the proposed Algorithm not only ran faster, but also obtained better classification performance compared with the comparison algorithms.

Original languageEnglish
Pages (from-to)293-314
Number of pages22
JournalInformation Sciences
Volume626
DOIs
StatePublished - May 2023

Keywords

  • Ball-k-means
  • Data editing
  • Self-training
  • Semi-supervised learning
  • classification

Fingerprint

Dive into the research topics of 'Fast semi-supervised self-training algorithm based on data editing'. Together they form a unique fingerprint.

Cite this