TY - JOUR
T1 - Spectral embedded clustering
T2 - A framework for in-sample and out-of-sample spectral clustering
AU - Nie, Feiping
AU - Zeng, Zinan
AU - Tsang, Ivor W.
AU - Xu, Dong
AU - Zhang, Changshui
PY - 2011/11
Y1 - 2011/11
N2 - Spectral clustering (SC) methods have been successfully applied to many real-world applications. The success of these SC methods is largely based on the manifold assumption, namely, that two nearby data points in the high-density region of a low-dimensional data manifold have the same cluster label. However, such an assumption might not always hold on high-dimensional data. When the data do not exhibit a clear low-dimensional manifold structure (e.g., high-dimensional and sparse data), the clustering performance of SC will be degraded and become even worse than K -means clustering. In this paper, motivated by the observation that the true cluster assignment matrix for high-dimensional data can be always embedded in a linear space spanned by the data, we propose the spectral embedded clustering (SEC) framework, in which a linearity regularization is explicitly added into the objective function of SC methods. More importantly, the proposed SEC framework can naturally deal with out-of-sample data. We also present a new Laplacian matrix constructed from a local regression of each pattern and incorporate it into our SEC framework to capture both local and global discriminative information for clustering. Comprehensive experiments on eight real-world high-dimensional datasets demonstrate the effectiveness and advantages of our SEC framework over existing SC methods and K-means-based clustering methods. Our SEC framework significantly outperforms SC using the Nystrm algorithm on unseen data.
AB - Spectral clustering (SC) methods have been successfully applied to many real-world applications. The success of these SC methods is largely based on the manifold assumption, namely, that two nearby data points in the high-density region of a low-dimensional data manifold have the same cluster label. However, such an assumption might not always hold on high-dimensional data. When the data do not exhibit a clear low-dimensional manifold structure (e.g., high-dimensional and sparse data), the clustering performance of SC will be degraded and become even worse than K -means clustering. In this paper, motivated by the observation that the true cluster assignment matrix for high-dimensional data can be always embedded in a linear space spanned by the data, we propose the spectral embedded clustering (SEC) framework, in which a linearity regularization is explicitly added into the objective function of SC methods. More importantly, the proposed SEC framework can naturally deal with out-of-sample data. We also present a new Laplacian matrix constructed from a local regression of each pattern and incorporate it into our SEC framework to capture both local and global discriminative information for clustering. Comprehensive experiments on eight real-world high-dimensional datasets demonstrate the effectiveness and advantages of our SEC framework over existing SC methods and K-means-based clustering methods. Our SEC framework significantly outperforms SC using the Nystrm algorithm on unseen data.
KW - Linearity regularization
KW - out-of-sample clustering
KW - spectral clustering
KW - spectral embedded clustering
UR - http://www.scopus.com/inward/record.url?scp=80455143729&partnerID=8YFLogxK
U2 - 10.1109/TNN.2011.2162000
DO - 10.1109/TNN.2011.2162000
M3 - 文章
C2 - 21965198
AN - SCOPUS:80455143729
SN - 1045-9227
VL - 22
SP - 1796
EP - 1808
JO - IEEE Transactions on Neural Networks
JF - IEEE Transactions on Neural Networks
IS - 11
M1 - 6030950
ER -