Spectral clustering of large-scale data by directly solving normalized cut

Xiaojun Chen, Dan He, Weijun Hong, Min Yang, Feiping Nie, Joshua Zhexue Huang

科研成果: 书/报告/会议事项章节会议稿件同行评审

65 引用 (Scopus)

摘要

During the past decades, many spectral clustering algorithms have been proposed. However, their high computational complexities hinder their applications on large-scale data. Moreover, most of them use a two-step approach to obtain the optimal solution, which may deviate from the solution by directly solving the original problem. In this paper, we propose a new optimization algorithm, namely Direct Normalized Cut (DNC), to directly optimize the normalized cut model. DNC has a quadratic time complexity, which is a significant reduction comparing with the cubic time complexity of the traditional spectral clustering. To cope with large-scale data, a Fast Normalized Cut (FNC) method with linear time and space complexities is proposed by extending DNC with an anchor-based strategy. In the new method, we first seek a set of anchors and then construct a representative similarity matrix by computing distances between the anchors and the whole data set. To find high quality anchors that best represent the whole data set, we propose a Balanced k-means (BKM) to partition a data set into balanced clusters and use the cluster centers as anchors. Then DNC is used to obtain the final clustering result from the representative similarity matrix. A series of experiments were conducted on both synthetic data and real-world data sets, and the experimental results show the superior performance of BKM, DNC and FNC.

源语言英语
主期刊名KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
出版商Association for Computing Machinery
1206-1215
页数10
ISBN(印刷版)9781450355520
DOI
出版状态已出版 - 19 7月 2018
活动24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018 - London, 英国
期限: 19 8月 201823 8月 2018

出版系列

姓名Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

会议

会议24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018
国家/地区英国
London
时期19/08/1823/08/18

指纹

探究 'Spectral clustering of large-scale data by directly solving normalized cut' 的科研主题。它们共同构成独一无二的指纹。

引用此