Large-Scale Clustering With Anchor-Based Constrained Laplacian Rank

Zhenyu Ma; Jingyu Wang; Feiping Nie; Xuelong Li

doi:10.1109/TKDE.2025.3557718

Large-Scale Clustering With Anchor-Based Constrained Laplacian Rank

Zhenyu Ma, Jingyu Wang, Feiping Nie, Xuelong Li

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Graph-based clustering technique has garnered significant attention due to precise information characterization by pairwise graph similarity. Nevertheless, the post-processing step in traditional methods often limits clustering effects because of crucial information loss. Therefore, the Constrained Laplacian Rank (CLR) theory emerges to directly obtain discrete labels from optimally structural graph, achieving desirable outcomes. However, CLR suffers from substantial time overhead, making it infeasible for large-scale data analysis. To overcome this issue, we propose Anchor-based CLR (ACLR), a simple yet effective method for efficient large-scale clustering. The ACLR method comprises four stages: (1) anchors that roughly cover original data are opted to prepare bipartite graph construction; (2) a novel two-step probability transition (TSPT) strategy initializes a small-scale graph with random walk probability among anchors; (3) the main ACLR model alternately optimizes the graph connected structure and directly produces discrete anchor labels, achieving a time complexity independent of the number of samples due to dramatically reduced graph scale; and (4) labels are propagated from anchors to samples using K-NN algorithm. Extensive experiments demonstrate that ACLR yields superior accuracy and efficiency, particularly when applied to large-scale data.

源语言	英语
期刊	IEEE Transactions on Knowledge and Data Engineering
DOI	https://doi.org/10.1109/TKDE.2025.3557718
出版状态	已接受/待刊 - 2025

访问文件

10.1109/TKDE.2025.3557718

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{97c0b83def9e4d8197fb8e0dc7485edf,

title = "Large-Scale Clustering With Anchor-Based Constrained Laplacian Rank",

abstract = "Graph-based clustering technique has garnered significant attention due to precise information characterization by pairwise graph similarity. Nevertheless, the post-processing step in traditional methods often limits clustering effects because of crucial information loss. Therefore, the Constrained Laplacian Rank (CLR) theory emerges to directly obtain discrete labels from optimally structural graph, achieving desirable outcomes. However, CLR suffers from substantial time overhead, making it infeasible for large-scale data analysis. To overcome this issue, we propose Anchor-based CLR (ACLR), a simple yet effective method for efficient large-scale clustering. The ACLR method comprises four stages: (1) anchors that roughly cover original data are opted to prepare bipartite graph construction; (2) a novel two-step probability transition (TSPT) strategy initializes a small-scale graph with random walk probability among anchors; (3) the main ACLR model alternately optimizes the graph connected structure and directly produces discrete anchor labels, achieving a time complexity independent of the number of samples due to dramatically reduced graph scale; and (4) labels are propagated from anchors to samples using K-NN algorithm. Extensive experiments demonstrate that ACLR yields superior accuracy and efficiency, particularly when applied to large-scale data.",

keywords = "Anchor, bipartite graph, constrained laplacian rank, graph connected structure, label propagation, large-scale clustering, two-step probability transition",

author = "Zhenyu Ma and Jingyu Wang and Feiping Nie and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 1989-2012 IEEE.",

year = "2025",

doi = "10.1109/TKDE.2025.3557718",

language = "英语",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Large-Scale Clustering With Anchor-Based Constrained Laplacian Rank

AU - Ma, Zhenyu

AU - Wang, Jingyu

AU - Nie, Feiping

AU - Li, Xuelong

PY - 2025

Y1 - 2025

N2 - Graph-based clustering technique has garnered significant attention due to precise information characterization by pairwise graph similarity. Nevertheless, the post-processing step in traditional methods often limits clustering effects because of crucial information loss. Therefore, the Constrained Laplacian Rank (CLR) theory emerges to directly obtain discrete labels from optimally structural graph, achieving desirable outcomes. However, CLR suffers from substantial time overhead, making it infeasible for large-scale data analysis. To overcome this issue, we propose Anchor-based CLR (ACLR), a simple yet effective method for efficient large-scale clustering. The ACLR method comprises four stages: (1) anchors that roughly cover original data are opted to prepare bipartite graph construction; (2) a novel two-step probability transition (TSPT) strategy initializes a small-scale graph with random walk probability among anchors; (3) the main ACLR model alternately optimizes the graph connected structure and directly produces discrete anchor labels, achieving a time complexity independent of the number of samples due to dramatically reduced graph scale; and (4) labels are propagated from anchors to samples using K-NN algorithm. Extensive experiments demonstrate that ACLR yields superior accuracy and efficiency, particularly when applied to large-scale data.

AB - Graph-based clustering technique has garnered significant attention due to precise information characterization by pairwise graph similarity. Nevertheless, the post-processing step in traditional methods often limits clustering effects because of crucial information loss. Therefore, the Constrained Laplacian Rank (CLR) theory emerges to directly obtain discrete labels from optimally structural graph, achieving desirable outcomes. However, CLR suffers from substantial time overhead, making it infeasible for large-scale data analysis. To overcome this issue, we propose Anchor-based CLR (ACLR), a simple yet effective method for efficient large-scale clustering. The ACLR method comprises four stages: (1) anchors that roughly cover original data are opted to prepare bipartite graph construction; (2) a novel two-step probability transition (TSPT) strategy initializes a small-scale graph with random walk probability among anchors; (3) the main ACLR model alternately optimizes the graph connected structure and directly produces discrete anchor labels, achieving a time complexity independent of the number of samples due to dramatically reduced graph scale; and (4) labels are propagated from anchors to samples using K-NN algorithm. Extensive experiments demonstrate that ACLR yields superior accuracy and efficiency, particularly when applied to large-scale data.

KW - Anchor

KW - bipartite graph

KW - constrained laplacian rank

KW - graph connected structure

KW - label propagation

KW - large-scale clustering

KW - two-step probability transition

UR - http://www.scopus.com/inward/record.url?scp=105002559680&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2025.3557718

DO - 10.1109/TKDE.2025.3557718

M3 - 文章

AN - SCOPUS:105002559680

SN - 1041-4347

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

ER -

Large-Scale Clustering With Anchor-Based Constrained Laplacian Rank

摘要

访问文件

其它文件与链接

指纹

引用此