TY - JOUR
T1 - Subspace clustering by directly solving Discriminative K-means
AU - Gao, Chenhui
AU - Chen, Wenzhi
AU - Nie, Feiping
AU - Yu, Weizhong
AU - Yan, Feihu
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/9/27
Y1 - 2022/9/27
N2 - Applications in many domains such as text mining and natural language processing need to deal with high-dimensional data. High-dimensional data may present better clustering characteristics on a selected low-dimensional subspace. Subspace clustering is to project the data onto a low-dimensional subspace before clustering. Traditional subspace clustering methods employ eigenvalue decomposition to find the projection of the input data and perform K-means or kernel K-means to obtain the clustering matrix. This kind of methods is not only inefficient, but also adopts a two-step method to generate an approximate solution. Although Discriminative K-means (DisKmeans) integrates dimensionality reduction and clustering into a joint framework and solves the optimization problem by kernel K-means, such method needs to find the centroids in the kernel space and class labels iteratively and has a square time complexity. Accordingly, in this paper, we propose an algorithm, namely Fast DisKmeans (FDKM), to obtain the cluster indicator matrix in a direct way. Moreover, our proposed method has a linear time complexity, which is a significant reduction compared with the squared time complexity of DisKmeans. We also demonstrate that solving the object function of DisKmeans is equivalent to representing the cluster assignment matrix by a low-dimensional linear mapping of the data. Based on this observation, we propose the second algorithm, namely Iterative Fast DisKmeans (IFDKM), which also has a linear time complexity. A series of experiments were conducted on several datasets, and the experimental results showed the superior performance of FDKM and IFDKM.
AB - Applications in many domains such as text mining and natural language processing need to deal with high-dimensional data. High-dimensional data may present better clustering characteristics on a selected low-dimensional subspace. Subspace clustering is to project the data onto a low-dimensional subspace before clustering. Traditional subspace clustering methods employ eigenvalue decomposition to find the projection of the input data and perform K-means or kernel K-means to obtain the clustering matrix. This kind of methods is not only inefficient, but also adopts a two-step method to generate an approximate solution. Although Discriminative K-means (DisKmeans) integrates dimensionality reduction and clustering into a joint framework and solves the optimization problem by kernel K-means, such method needs to find the centroids in the kernel space and class labels iteratively and has a square time complexity. Accordingly, in this paper, we propose an algorithm, namely Fast DisKmeans (FDKM), to obtain the cluster indicator matrix in a direct way. Moreover, our proposed method has a linear time complexity, which is a significant reduction compared with the squared time complexity of DisKmeans. We also demonstrate that solving the object function of DisKmeans is equivalent to representing the cluster assignment matrix by a low-dimensional linear mapping of the data. Based on this observation, we propose the second algorithm, namely Iterative Fast DisKmeans (IFDKM), which also has a linear time complexity. A series of experiments were conducted on several datasets, and the experimental results showed the superior performance of FDKM and IFDKM.
KW - K-means
KW - Subspace clustering
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85135715666&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2022.109452
DO - 10.1016/j.knosys.2022.109452
M3 - 文章
AN - SCOPUS:85135715666
SN - 0950-7051
VL - 252
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 109452
ER -