Consensus spectral clustering in near-linear time

Dijun Luo; Chris Ding; Heng Huang; Feiping Nie

doi:10.1109/ICDE.2011.5767925

Consensus spectral clustering in near-linear time

Dijun Luo, Chris Ding, Heng Huang, Feiping Nie

University of Texas at Arlington

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

19 引用（Scopus）

摘要

This paper addresses the scalability issue in spectral analysis which has been widely used in data management applications. Spectral analysis techniques enjoy powerful clustering capability while suffer from high computational complexity. In most of previous research, the bottleneck of computational complexity of spectral analysis stems from the construction of pairwise similarity matrix among objects, which costs at least O(n²) where n is the number of the data points. In this paper, we propose a novel estimator of the similarity matrix using K-means accumulative consensus matrix which is intrinsically sparse. The computational cost of the accumulative consensus matrix is O(nlogn). We further develop a Non-negative Matrix Factorization approach to derive clustering assignment. The overall complexity of our approach remains O(nlogn). In order to validate our method, we (1) theoretically show the local preserving and convergent property of the similarity estimator, (2) validate it by a large number of real world datasets and compare the results to other state-of-the-art spectral analysis, and (3) apply it to large-scale data clustering problems. Results show that our approach uses much less computational time than other state-of-the-art clustering methods, meanwhile provides comparable clustering qualities. We also successfully apply our approach to a 5-million dataset on a single machine using reasonable time. Our techniques open a new direction for high-quality large-scale data analysis.

源语言	英语
主期刊名	2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
页	1079-1090
页数	12
DOI	https://doi.org/10.1109/ICDE.2011.5767925
出版状态	已出版 - 2011
已对外发布	是
活动	2011 IEEE 27th International Conference on Data Engineering, ICDE 2011 - Hannover, 德国期限: 11 4月 2011 → 16 4月 2011

出版系列

姓名	Proceedings - International Conference on Data Engineering
ISSN（印刷版）	1084-4627

会议

会议	2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
国家/地区	德国
市	Hannover
时期	11/04/11 → 16/04/11

访问文件

10.1109/ICDE.2011.5767925

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{27bfcfd46160451ca6e82bd0a1576725,

title = "Consensus spectral clustering in near-linear time",

abstract = "This paper addresses the scalability issue in spectral analysis which has been widely used in data management applications. Spectral analysis techniques enjoy powerful clustering capability while suffer from high computational complexity. In most of previous research, the bottleneck of computational complexity of spectral analysis stems from the construction of pairwise similarity matrix among objects, which costs at least O(n2) where n is the number of the data points. In this paper, we propose a novel estimator of the similarity matrix using K-means accumulative consensus matrix which is intrinsically sparse. The computational cost of the accumulative consensus matrix is O(nlogn). We further develop a Non-negative Matrix Factorization approach to derive clustering assignment. The overall complexity of our approach remains O(nlogn). In order to validate our method, we (1) theoretically show the local preserving and convergent property of the similarity estimator, (2) validate it by a large number of real world datasets and compare the results to other state-of-the-art spectral analysis, and (3) apply it to large-scale data clustering problems. Results show that our approach uses much less computational time than other state-of-the-art clustering methods, meanwhile provides comparable clustering qualities. We also successfully apply our approach to a 5-million dataset on a single machine using reasonable time. Our techniques open a new direction for high-quality large-scale data analysis.",

author = "Dijun Luo and Chris Ding and Heng Huang and Feiping Nie",

year = "2011",

doi = "10.1109/ICDE.2011.5767925",

language = "英语",

isbn = "9781424489589",

series = "Proceedings - International Conference on Data Engineering",

pages = "1079--1090",

booktitle = "2011 IEEE 27th International Conference on Data Engineering, ICDE 2011",

note = "2011 IEEE 27th International Conference on Data Engineering, ICDE 2011 ; Conference date: 11-04-2011 Through 16-04-2011",

}

Luo, D, Ding, C, Huang, H & Nie, F 2011, Consensus spectral clustering in near-linear time. 在 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011., 5767925, Proceedings - International Conference on Data Engineering, 页码 1079-1090, 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011, Hannover, 德国, 11/04/11. https://doi.org/10.1109/ICDE.2011.5767925

TY - GEN

T1 - Consensus spectral clustering in near-linear time

AU - Luo, Dijun

AU - Ding, Chris

AU - Huang, Heng

AU - Nie, Feiping

PY - 2011

Y1 - 2011

N2 - This paper addresses the scalability issue in spectral analysis which has been widely used in data management applications. Spectral analysis techniques enjoy powerful clustering capability while suffer from high computational complexity. In most of previous research, the bottleneck of computational complexity of spectral analysis stems from the construction of pairwise similarity matrix among objects, which costs at least O(n2) where n is the number of the data points. In this paper, we propose a novel estimator of the similarity matrix using K-means accumulative consensus matrix which is intrinsically sparse. The computational cost of the accumulative consensus matrix is O(nlogn). We further develop a Non-negative Matrix Factorization approach to derive clustering assignment. The overall complexity of our approach remains O(nlogn). In order to validate our method, we (1) theoretically show the local preserving and convergent property of the similarity estimator, (2) validate it by a large number of real world datasets and compare the results to other state-of-the-art spectral analysis, and (3) apply it to large-scale data clustering problems. Results show that our approach uses much less computational time than other state-of-the-art clustering methods, meanwhile provides comparable clustering qualities. We also successfully apply our approach to a 5-million dataset on a single machine using reasonable time. Our techniques open a new direction for high-quality large-scale data analysis.

AB - This paper addresses the scalability issue in spectral analysis which has been widely used in data management applications. Spectral analysis techniques enjoy powerful clustering capability while suffer from high computational complexity. In most of previous research, the bottleneck of computational complexity of spectral analysis stems from the construction of pairwise similarity matrix among objects, which costs at least O(n2) where n is the number of the data points. In this paper, we propose a novel estimator of the similarity matrix using K-means accumulative consensus matrix which is intrinsically sparse. The computational cost of the accumulative consensus matrix is O(nlogn). We further develop a Non-negative Matrix Factorization approach to derive clustering assignment. The overall complexity of our approach remains O(nlogn). In order to validate our method, we (1) theoretically show the local preserving and convergent property of the similarity estimator, (2) validate it by a large number of real world datasets and compare the results to other state-of-the-art spectral analysis, and (3) apply it to large-scale data clustering problems. Results show that our approach uses much less computational time than other state-of-the-art clustering methods, meanwhile provides comparable clustering qualities. We also successfully apply our approach to a 5-million dataset on a single machine using reasonable time. Our techniques open a new direction for high-quality large-scale data analysis.

UR - http://www.scopus.com/inward/record.url?scp=79957818756&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2011.5767925

DO - 10.1109/ICDE.2011.5767925

M3 - 会议稿件

AN - SCOPUS:79957818756

SN - 9781424489589

T3 - Proceedings - International Conference on Data Engineering

SP - 1079

EP - 1090

BT - 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011

T2 - 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011

Y2 - 11 April 2011 through 16 April 2011

ER -

Consensus spectral clustering in near-linear time

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此