TY - JOUR
T1 - Large-Scale Clustering With Anchor-Based Constrained Laplacian Rank
AU - Ma, Zhenyu
AU - Wang, Jingyu
AU - Nie, Feiping
AU - Li, Xuelong
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Graph-based clustering technique has garnered significant attention due to precise information characterization by pairwise graph similarity. Nevertheless, the post-processing step in traditional methods often limits clustering effects because of crucial information loss. Therefore, the Constrained Laplacian Rank (CLR) theory emerges to directly obtain discrete labels from optimally structural graph, achieving desirable outcomes. However, CLR suffers from substantial time overhead, making it infeasible for large-scale data analysis. To overcome this issue, we propose Anchor-based CLR (ACLR), a simple yet effective method for efficient large-scale clustering. The ACLR method comprises four stages: (1) anchors that roughly cover original data are opted to prepare bipartite graph construction; (2) a novel two-step probability transition (TSPT) strategy initializes a small-scale graph with random walk probability among anchors; (3) the main ACLR model alternately optimizes the graph connected structure and directly produces discrete anchor labels, achieving a time complexity independent of the number of samples due to dramatically reduced graph scale; and (4) labels are propagated from anchors to samples using K-NN algorithm. Extensive experiments demonstrate that ACLR yields superior accuracy and efficiency, particularly when applied to large-scale data.
AB - Graph-based clustering technique has garnered significant attention due to precise information characterization by pairwise graph similarity. Nevertheless, the post-processing step in traditional methods often limits clustering effects because of crucial information loss. Therefore, the Constrained Laplacian Rank (CLR) theory emerges to directly obtain discrete labels from optimally structural graph, achieving desirable outcomes. However, CLR suffers from substantial time overhead, making it infeasible for large-scale data analysis. To overcome this issue, we propose Anchor-based CLR (ACLR), a simple yet effective method for efficient large-scale clustering. The ACLR method comprises four stages: (1) anchors that roughly cover original data are opted to prepare bipartite graph construction; (2) a novel two-step probability transition (TSPT) strategy initializes a small-scale graph with random walk probability among anchors; (3) the main ACLR model alternately optimizes the graph connected structure and directly produces discrete anchor labels, achieving a time complexity independent of the number of samples due to dramatically reduced graph scale; and (4) labels are propagated from anchors to samples using K-NN algorithm. Extensive experiments demonstrate that ACLR yields superior accuracy and efficiency, particularly when applied to large-scale data.
KW - Anchor
KW - bipartite graph
KW - constrained laplacian rank
KW - graph connected structure
KW - label propagation
KW - large-scale clustering
KW - two-step probability transition
UR - http://www.scopus.com/inward/record.url?scp=105002559680&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2025.3557718
DO - 10.1109/TKDE.2025.3557718
M3 - 文章
AN - SCOPUS:105002559680
SN - 1041-4347
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
ER -