Consensus spectral clustering in near-linear time

Dijun Luo, Chris Ding, Heng Huang, Feiping Nie

科研成果: 书/报告/会议事项章节会议稿件同行评审

19 引用 (Scopus)

摘要

This paper addresses the scalability issue in spectral analysis which has been widely used in data management applications. Spectral analysis techniques enjoy powerful clustering capability while suffer from high computational complexity. In most of previous research, the bottleneck of computational complexity of spectral analysis stems from the construction of pairwise similarity matrix among objects, which costs at least O(n2) where n is the number of the data points. In this paper, we propose a novel estimator of the similarity matrix using K-means accumulative consensus matrix which is intrinsically sparse. The computational cost of the accumulative consensus matrix is O(nlogn). We further develop a Non-negative Matrix Factorization approach to derive clustering assignment. The overall complexity of our approach remains O(nlogn). In order to validate our method, we (1) theoretically show the local preserving and convergent property of the similarity estimator, (2) validate it by a large number of real world datasets and compare the results to other state-of-the-art spectral analysis, and (3) apply it to large-scale data clustering problems. Results show that our approach uses much less computational time than other state-of-the-art clustering methods, meanwhile provides comparable clustering qualities. We also successfully apply our approach to a 5-million dataset on a single machine using reasonable time. Our techniques open a new direction for high-quality large-scale data analysis.

源语言英语
主期刊名2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
1079-1090
页数12
DOI
出版状态已出版 - 2011
已对外发布
活动2011 IEEE 27th International Conference on Data Engineering, ICDE 2011 - Hannover, 德国
期限: 11 4月 201116 4月 2011

出版系列

姓名Proceedings - International Conference on Data Engineering
ISSN(印刷版)1084-4627

会议

会议2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
国家/地区德国
Hannover
时期11/04/1116/04/11

指纹

探究 'Consensus spectral clustering in near-linear time' 的科研主题。它们共同构成独一无二的指纹。

引用此