TY - JOUR
T1 - Rooted Mahalanobis distance based Gustafson-Kessel fuzzy C-means
AU - Chen, Qiang
AU - Yu, Weizhong
AU - Zhao, Xiaowei
AU - Nie, Feiping
AU - Li, Xuelong
N1 - Publisher Copyright:
© 2023
PY - 2023/10
Y1 - 2023/10
N2 - Fuzzy c-means (FCM) is a classic unsupervised clustering algorithm in machine learning fields. Euclidean distance is a frequently used distance metric in FCM, but it is only suitable for data with spherical clusters. Therefore, Mahalanobis distance was introduced into Gustafson-Kessel Fuzzy C-Means (GK-FCM) to help improve the performance on data with ellipsoidal clusters. However, GK-FCM and existing Mahalanobis distance based algorithms only focus on squared Mahalanobis distance, because squared Mahalanobis distance based problems are usually convex and easily solvable. But squared Mahalanobis distance is not a perfect metric, because it tends to exaggerate the influence of outliers and lead to unsatisfying results. In this paper, we propose a rooted Mahalanobis distance based GK-FCM model, which has better clustering performance and superior robustness than traditional GK-FCM. Moreover, owing to the introduction of rooted Mahalanobis distance, the optimization of the proposed model becomes non-trivial and it is not realistic to obtain a closed-form solution as that of traditional GK-FCM. In this paper, by making reference to the re-weighted method, we develop a novel iterative converging algorithm to optimize the proposed model. Finally, extensive experiments are conducted on both synthetic and real-world data sets to manifest the superiority of the proposed model.
AB - Fuzzy c-means (FCM) is a classic unsupervised clustering algorithm in machine learning fields. Euclidean distance is a frequently used distance metric in FCM, but it is only suitable for data with spherical clusters. Therefore, Mahalanobis distance was introduced into Gustafson-Kessel Fuzzy C-Means (GK-FCM) to help improve the performance on data with ellipsoidal clusters. However, GK-FCM and existing Mahalanobis distance based algorithms only focus on squared Mahalanobis distance, because squared Mahalanobis distance based problems are usually convex and easily solvable. But squared Mahalanobis distance is not a perfect metric, because it tends to exaggerate the influence of outliers and lead to unsatisfying results. In this paper, we propose a rooted Mahalanobis distance based GK-FCM model, which has better clustering performance and superior robustness than traditional GK-FCM. Moreover, owing to the introduction of rooted Mahalanobis distance, the optimization of the proposed model becomes non-trivial and it is not realistic to obtain a closed-form solution as that of traditional GK-FCM. In this paper, by making reference to the re-weighted method, we develop a novel iterative converging algorithm to optimize the proposed model. Finally, extensive experiments are conducted on both synthetic and real-world data sets to manifest the superiority of the proposed model.
KW - Clustering
KW - Fuzzy C-means
KW - Gustafson Kessel
KW - Mahalanobis distance
KW - Unsupervised
UR - http://www.scopus.com/inward/record.url?scp=85161327675&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2023.03.103
DO - 10.1016/j.ins.2023.03.103
M3 - 文章
AN - SCOPUS:85161327675
SN - 0020-0255
VL - 644
JO - Information Sciences
JF - Information Sciences
M1 - 118878
ER -