A robust entropy regularized K-means clustering algorithm for processing noise in datasets

Peilin Jiang; Junnan Cao; Weizhong Yu; Feiping Nie

doi:10.1007/s00521-024-10899-4

A robust entropy regularized K-means clustering algorithm for processing noise in datasets

Peilin Jiang, Junnan Cao, Weizhong Yu, Feiping Nie

School of Artificial Intelligence, OPtics and Electronics

Research output: Contribution to journal › Article › peer-review

Abstract

K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.

Original language	English
Article number	106518
Pages (from-to)	6617-6632
Number of pages	16
Journal	Neural Computing and Applications
Volume	37
Issue number	9
DOIs	https://doi.org/10.1007/s00521-024-10899-4
State	Published - Mar 2025

Keywords

Clustering
Entropy regularized K-means
Outlier
Robust

Access to Document

10.1007/s00521-024-10899-4

Cite this

@article{56f5b006caab4bc6afc5dc010ac96ea2,

title = "A robust entropy regularized K-means clustering algorithm for processing noise in datasets",

abstract = "K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.",

keywords = "Clustering, Entropy regularized K-means, Outlier, Robust",

author = "Peilin Jiang and Junnan Cao and Weizhong Yu and Feiping Nie",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.",

year = "2025",

month = mar,

doi = "10.1007/s00521-024-10899-4",

language = "英语",

volume = "37",

pages = "6617--6632",

journal = "Neural Computing and Applications",

issn = "0941-0643",

publisher = "Springer London",

number = "9",

}

TY - JOUR

T1 - A robust entropy regularized K-means clustering algorithm for processing noise in datasets

AU - Jiang, Peilin

AU - Cao, Junnan

AU - Yu, Weizhong

AU - Nie, Feiping

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.

PY - 2025/3

Y1 - 2025/3

N2 - K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.

AB - K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.

KW - Clustering

KW - Entropy regularized K-means

KW - Outlier

KW - Robust

UR - http://www.scopus.com/inward/record.url?scp=85217187207&partnerID=8YFLogxK

U2 - 10.1007/s00521-024-10899-4

DO - 10.1007/s00521-024-10899-4

M3 - 文章

AN - SCOPUS:85217187207

SN - 0941-0643

VL - 37

SP - 6617

EP - 6632

JO - Neural Computing and Applications

JF - Neural Computing and Applications

IS - 9

M1 - 106518

ER -

A robust entropy regularized K-means clustering algorithm for processing noise in datasets

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this