A robust entropy regularized K-means clustering algorithm for processing noise in datasets

Peilin Jiang; Junnan Cao; Weizhong Yu; Feiping Nie

doi:10.1007/s00521-024-10899-4

A robust entropy regularized K-means clustering algorithm for processing noise in datasets

Peilin Jiang, Junnan Cao, Weizhong Yu, Feiping Nie

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.

源语言	英语
文章编号	106518
页（从-至）	6617-6632
页数	16
期刊	Neural Computing and Applications
卷	37
期	9
DOI	https://doi.org/10.1007/s00521-024-10899-4
出版状态	已出版 - 3月 2025

访问文件

10.1007/s00521-024-10899-4

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{56f5b006caab4bc6afc5dc010ac96ea2,

title = "A robust entropy regularized K-means clustering algorithm for processing noise in datasets",

abstract = "K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.",

keywords = "Clustering, Entropy regularized K-means, Outlier, Robust",

author = "Peilin Jiang and Junnan Cao and Weizhong Yu and Feiping Nie",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.",

year = "2025",

month = mar,

doi = "10.1007/s00521-024-10899-4",

language = "英语",

volume = "37",

pages = "6617--6632",

journal = "Neural Computing and Applications",

issn = "0941-0643",

publisher = "Springer London",

number = "9",

}

TY - JOUR

T1 - A robust entropy regularized K-means clustering algorithm for processing noise in datasets

AU - Jiang, Peilin

AU - Cao, Junnan

AU - Yu, Weizhong

AU - Nie, Feiping

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.

PY - 2025/3

Y1 - 2025/3

N2 - K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.

AB - K-means is one of the clustering algorithms. Due to its simple implementation and powerful functionality, it is widely used in fields such as data mining, cluster analysis, data preprocessing, and unsupervised learning. However, the K-means algorithm suffers from the problem of being sensitive to outliers. If there are a certain number of outliers in a low-dimensional sample set, the resulting cluster centers will be greatly disturbed, affecting the clustering results. We can certainly detect outliers before clustering, but this phased approach has an impact on the accuracy of clustering results. To address this issue, we propose an improved robust Entropy Regularized K-Means clustering algorithm. Our method is based on the Entropy Regularized K-Means clustering algorithm and adds a weight value to the optimization function to ignore out-of-bounds data, and obtain a more accurate number of clusters in the dataset, thereby achieving synchronous clustering and detection. The advantages of this algorithm are strong anti-interference ability, the ability to ignore the influence of outliers on cluster centers, and synchronous clustering and detection. We tested our improved algorithm on artificial and real datasets, demonstrating that it can better determine cluster centers and find some outlier data.

KW - Clustering

KW - Entropy regularized K-means

KW - Outlier

KW - Robust

UR - http://www.scopus.com/inward/record.url?scp=85217187207&partnerID=8YFLogxK

U2 - 10.1007/s00521-024-10899-4

DO - 10.1007/s00521-024-10899-4

M3 - 文章

AN - SCOPUS:85217187207

SN - 0941-0643

VL - 37

SP - 6617

EP - 6632

JO - Neural Computing and Applications

JF - Neural Computing and Applications

IS - 9

M1 - 106518

ER -

A robust entropy regularized K-means clustering algorithm for processing noise in datasets

摘要

访问文件

其它文件与链接

指纹

引用此