A robust entropy regularized K-means clustering algorithm for processing noise in datasets

Peilin Jiang, Junnan Cao, Weizhong Yu, Feiping Nie

Research output: Contribution to journalArticlepeer-review

Abstract

K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.

Original languageEnglish
Article number106518
Pages (from-to)6617-6632
Number of pages16
JournalNeural Computing and Applications
Volume37
Issue number9
DOIs
StatePublished - Mar 2025

Keywords

  • Clustering
  • Entropy regularized K-means
  • Outlier
  • Robust

Fingerprint

Dive into the research topics of 'A robust entropy regularized K-means clustering algorithm for processing noise in datasets'. Together they form a unique fingerprint.

Cite this