TY - JOUR
T1 - A robust entropy regularized K-means clustering algorithm for processing noise in datasets
AU - Jiang, Peilin
AU - Cao, Junnan
AU - Yu, Weizhong
AU - Nie, Feiping
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.
PY - 2025/3
Y1 - 2025/3
N2 - K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.
AB - K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.
KW - Clustering
KW - Entropy regularized K-means
KW - Outlier
KW - Robust
UR - http://www.scopus.com/inward/record.url?scp=85217187207&partnerID=8YFLogxK
U2 - 10.1007/s00521-024-10899-4
DO - 10.1007/s00521-024-10899-4
M3 - 文章
AN - SCOPUS:85217187207
SN - 0941-0643
VL - 37
SP - 6617
EP - 6632
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 9
M1 - 106518
ER -