Structured doubly stochastic matrix for graph based clustering

Xiaoqian Wang; Feiping Nie; Heng Huang

doi:10.1145/2939672.2939805

Structured doubly stochastic matrix for graph based clustering

Xiaoqian Wang, Feiping Nie, Heng Huang

University of Texas at Arlington

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

45 Scopus citations

Abstract

As one of the most significant machine learning topics, clustering has been extensively employed in various kinds of area. Its prevalent application in scientific research as well as industrial practice has drawn high attention in this day and age. A multitude of clustering methods have been developed, among which the graph based clustering method using the affinity matrix has been laid great emphasis on. Recent research work used the doubly stochastic matrix to normalize the input affinity matrix and enhance the graph based clustering models. Although the doubly stochastic matrix can improve the clustering performance, the clustering structure in the doubly stochastic matrix is not clear as expected. Thus, postprocessing step is required to extract the final clustering results, which may not be optimal. To address this problem, in this paper, we propose a novel convex model to learn the structured doubly stochastic matrix by imposing low-rank constraint on the graph Laplacian matrix. Our new structured doubly stochastic matrix can explicitly uncover the clustering structure and encode the probabilities of pair-wise data points to be connected, such that the clustering results are enhanced. An efficient optimization algorithm is derived to solve our new objective. Also, we provide theoretical discussions that when the input differs, our method possesses interesting connections with K-means and spectral graph cut models respectively. We conduct experiments on both synthetic and benchmark datasets to validate the performance of our proposed method. The empirical results demonstrate that our model provides an approach to better solving the K-mean clustering problem. By using the cluster indicator provided by our model as initialization, Kmeans converges to a smaller objective function value with better clustering performance. Moreover, we compare the clustering performance of our model with spectral clustering and related double stochastic model. On all datasets, our method performs equally or better than the related methods.

Original language	English
Title of host publication	KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Publisher	Association for Computing Machinery
Pages	1245-1254
Number of pages	10
ISBN (Electronic)	9781450342322
DOIs	https://doi.org/10.1145/2939672.2939805
State	Published - 13 Aug 2016
Externally published	Yes
Event	22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, United States Duration: 13 Aug 2016 → 17 Aug 2016

Publication series

Name	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume	13-17-August-2016

Conference

Conference	22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
Country/Territory	United States
City	San Francisco
Period	13/08/16 → 17/08/16

Keywords

Doubly stochastic matrix
Graph laplacian
K-means clustering
Spectral clustering

Access to Document

10.1145/2939672.2939805

Cite this

Wang, X., Nie, F., & Huang, H. (2016). Structured doubly stochastic matrix for graph based clustering. In KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1245-1254). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 13-17-August-2016). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939805

Wang, Xiaoqian ; Nie, Feiping ; Huang, Heng. / Structured doubly stochastic matrix for graph based clustering. KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2016. pp. 1245-1254 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

@inproceedings{fbfae30e0a4d4f32a608f343b9d44cbb,

title = "Structured doubly stochastic matrix for graph based clustering",

abstract = "As one of the most significant machine learning topics, clustering has been extensively employed in various kinds of area. Its prevalent application in scientific research as well as industrial practice has drawn high attention in this day and age. A multitude of clustering methods have been developed, among which the graph based clustering method using the affinity matrix has been laid great emphasis on. Recent research work used the doubly stochastic matrix to normalize the input affinity matrix and enhance the graph based clustering models. Although the doubly stochastic matrix can improve the clustering performance, the clustering structure in the doubly stochastic matrix is not clear as expected. Thus, postprocessing step is required to extract the final clustering results, which may not be optimal. To address this problem, in this paper, we propose a novel convex model to learn the structured doubly stochastic matrix by imposing low-rank constraint on the graph Laplacian matrix. Our new structured doubly stochastic matrix can explicitly uncover the clustering structure and encode the probabilities of pair-wise data points to be connected, such that the clustering results are enhanced. An efficient optimization algorithm is derived to solve our new objective. Also, we provide theoretical discussions that when the input differs, our method possesses interesting connections with K-means and spectral graph cut models respectively. We conduct experiments on both synthetic and benchmark datasets to validate the performance of our proposed method. The empirical results demonstrate that our model provides an approach to better solving the K-mean clustering problem. By using the cluster indicator provided by our model as initialization, Kmeans converges to a smaller objective function value with better clustering performance. Moreover, we compare the clustering performance of our model with spectral clustering and related double stochastic model. On all datasets, our method performs equally or better than the related methods.",

keywords = "Doubly stochastic matrix, Graph laplacian, K-means clustering, Spectral clustering",

author = "Xiaoqian Wang and Feiping Nie and Heng Huang",

note = "Publisher Copyright: {\textcopyright} 2016 ACM.; 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 ; Conference date: 13-08-2016 Through 17-08-2016",

year = "2016",

month = aug,

day = "13",

doi = "10.1145/2939672.2939805",

language = "英语",

series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

publisher = "Association for Computing Machinery",

pages = "1245--1254",

booktitle = "KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

Wang, X, Nie, F & Huang, H 2016, Structured doubly stochastic matrix for graph based clustering. in KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, Association for Computing Machinery, pp. 1245-1254, 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, San Francisco, United States, 13/08/16. https://doi.org/10.1145/2939672.2939805

Structured doubly stochastic matrix for graph based clustering. / Wang, Xiaoqian; Nie, Feiping; Huang, Heng.
KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2016. p. 1245-1254 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 13-17-August-2016).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Structured doubly stochastic matrix for graph based clustering

AU - Wang, Xiaoqian

AU - Nie, Feiping

AU - Huang, Heng

PY - 2016/8/13

Y1 - 2016/8/13

N2 - As one of the most significant machine learning topics, clustering has been extensively employed in various kinds of area. Its prevalent application in scientific research as well as industrial practice has drawn high attention in this day and age. A multitude of clustering methods have been developed, among which the graph based clustering method using the affinity matrix has been laid great emphasis on. Recent research work used the doubly stochastic matrix to normalize the input affinity matrix and enhance the graph based clustering models. Although the doubly stochastic matrix can improve the clustering performance, the clustering structure in the doubly stochastic matrix is not clear as expected. Thus, postprocessing step is required to extract the final clustering results, which may not be optimal. To address this problem, in this paper, we propose a novel convex model to learn the structured doubly stochastic matrix by imposing low-rank constraint on the graph Laplacian matrix. Our new structured doubly stochastic matrix can explicitly uncover the clustering structure and encode the probabilities of pair-wise data points to be connected, such that the clustering results are enhanced. An efficient optimization algorithm is derived to solve our new objective. Also, we provide theoretical discussions that when the input differs, our method possesses interesting connections with K-means and spectral graph cut models respectively. We conduct experiments on both synthetic and benchmark datasets to validate the performance of our proposed method. The empirical results demonstrate that our model provides an approach to better solving the K-mean clustering problem. By using the cluster indicator provided by our model as initialization, Kmeans converges to a smaller objective function value with better clustering performance. Moreover, we compare the clustering performance of our model with spectral clustering and related double stochastic model. On all datasets, our method performs equally or better than the related methods.

AB - As one of the most significant machine learning topics, clustering has been extensively employed in various kinds of area. Its prevalent application in scientific research as well as industrial practice has drawn high attention in this day and age. A multitude of clustering methods have been developed, among which the graph based clustering method using the affinity matrix has been laid great emphasis on. Recent research work used the doubly stochastic matrix to normalize the input affinity matrix and enhance the graph based clustering models. Although the doubly stochastic matrix can improve the clustering performance, the clustering structure in the doubly stochastic matrix is not clear as expected. Thus, postprocessing step is required to extract the final clustering results, which may not be optimal. To address this problem, in this paper, we propose a novel convex model to learn the structured doubly stochastic matrix by imposing low-rank constraint on the graph Laplacian matrix. Our new structured doubly stochastic matrix can explicitly uncover the clustering structure and encode the probabilities of pair-wise data points to be connected, such that the clustering results are enhanced. An efficient optimization algorithm is derived to solve our new objective. Also, we provide theoretical discussions that when the input differs, our method possesses interesting connections with K-means and spectral graph cut models respectively. We conduct experiments on both synthetic and benchmark datasets to validate the performance of our proposed method. The empirical results demonstrate that our model provides an approach to better solving the K-mean clustering problem. By using the cluster indicator provided by our model as initialization, Kmeans converges to a smaller objective function value with better clustering performance. Moreover, we compare the clustering performance of our model with spectral clustering and related double stochastic model. On all datasets, our method performs equally or better than the related methods.

KW - Doubly stochastic matrix

KW - Graph laplacian

KW - K-means clustering

KW - Spectral clustering

UR - http://www.scopus.com/inward/record.url?scp=84984950698&partnerID=8YFLogxK

U2 - 10.1145/2939672.2939805

DO - 10.1145/2939672.2939805

M3 - 会议稿件

AN - SCOPUS:84984950698

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 1245

EP - 1254

BT - KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

T2 - 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016

Y2 - 13 August 2016 through 17 August 2016

ER -

Wang X, Nie F, Huang H. Structured doubly stochastic matrix for graph based clustering. In KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2016. p. 1245-1254. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). doi: 10.1145/2939672.2939805

Structured doubly stochastic matrix for graph based clustering

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this