Spectral clustering of large-scale data by directly solving normalized cut

Xiaojun Chen, Dan He, Weijun Hong, Min Yang, Feiping Nie, Joshua Zhexue Huang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

69 Scopus citations

Abstract

During the past decades, many spectral clustering algorithms have been proposed. However, their high computational complexities hinder their applications on large-scale data. Moreover, most of them use a two-step approach to obtain the optimal solution, which may deviate from the solution by directly solving the original problem. In this paper, we propose a new optimization algorithm, namely Direct Normalized Cut (DNC), to directly optimize the normalized cut model. DNC has a quadratic time complexity, which is a significant reduction comparing with the cubic time complexity of the traditional spectral clustering. To cope with large-scale data, a Fast Normalized Cut (FNC) method with linear time and space complexities is proposed by extending DNC with an anchor-based strategy. In the new method, we first seek a set of anchors and then construct a representative similarity matrix by computing distances between the anchors and the whole data set. To find high quality anchors that best represent the whole data set, we propose a Balanced k-means (BKM) to partition a data set into balanced clusters and use the cluster centers as anchors. Then DNC is used to obtain the final clustering result from the representative similarity matrix. A series of experiments were conducted on both synthetic data and real-world data sets, and the experimental results show the superior performance of BKM, DNC and FNC.

Original languageEnglish
Title of host publicationKDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1206-1215
Number of pages10
ISBN (Print)9781450355520
DOIs
StatePublished - 19 Jul 2018
Event24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018 - London, United Kingdom
Duration: 19 Aug 201823 Aug 2018

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018
Country/TerritoryUnited Kingdom
CityLondon
Period19/08/1823/08/18

Keywords

  • Clustering
  • Large-scale data
  • Normalized cut

Fingerprint

Dive into the research topics of 'Spectral clustering of large-scale data by directly solving normalized cut'. Together they form a unique fingerprint.

Cite this