Adaptive Graph K-Means

Shenfei Pei; Yuanchen Sun; Feiping Nie; Xudong Jiang; Zengwei Zheng

doi:10.1016/j.patcog.2024.111226

Adaptive Graph K-Means

Shenfei Pei, Yuanchen Sun, Feiping Nie, Xudong Jiang, Zengwei Zheng

School of Artificial Intelligence, OPtics and Electronics

Research output: Contribution to journal › Article › peer-review

Abstract

Clustering large-scale datasets has received increasing attention recently. However, existing algorithms are still not efficient in scenarios with extremely large number of clusters. To this end, Adaptive Graph K-Means (AGKM) is proposed in this work. Its idea originates from k-means, but it operates on an adaptive k-Nearest Neighbor (k-NN) graph instead of data features. First, AGKM is highly efficient for processing datasets where both the numbers of samples and clusters are very large. Specifically, the time and space complexity are both linear w.r.t the number of samples and, more importantly, independent to the cluster number. Second, AGKM is designed for balanced clusters. This constraint is realized by adding a regularization term in loss function, and a simple modification of the graph in optimization algorithm, which does not increase the computational burden. Last, the indicator and dissimilarity matrices are learned simultaneously, so that the proposed AGKM obtains the final partition directly with higher efficacy and efficiency. Experiments on several datasets validate the advantages of AGKM. In particular, over 29X and 46X speed-ups with respect to k-means are observed on the two large-scale datasets WebFace and CelebA, respectively.

Original language	English
Article number	111226
Journal	Pattern Recognition
Volume	161
DOIs	https://doi.org/10.1016/j.patcog.2024.111226
State	Published - May 2025

Keywords

Clustering
Computational efficiency
Graph-based
k-means
Machine learning

Access to Document

10.1016/j.patcog.2024.111226

Cite this

@article{37ba0eeb70df469a84b0e057193ad703,

title = "Adaptive Graph K-Means",

abstract = "Clustering large-scale datasets has received increasing attention recently. However, existing algorithms are still not efficient in scenarios with extremely large number of clusters. To this end, Adaptive Graph K-Means (AGKM) is proposed in this work. Its idea originates from k-means, but it operates on an adaptive k-Nearest Neighbor (k-NN) graph instead of data features. First, AGKM is highly efficient for processing datasets where both the numbers of samples and clusters are very large. Specifically, the time and space complexity are both linear w.r.t the number of samples and, more importantly, independent to the cluster number. Second, AGKM is designed for balanced clusters. This constraint is realized by adding a regularization term in loss function, and a simple modification of the graph in optimization algorithm, which does not increase the computational burden. Last, the indicator and dissimilarity matrices are learned simultaneously, so that the proposed AGKM obtains the final partition directly with higher efficacy and efficiency. Experiments on several datasets validate the advantages of AGKM. In particular, over 29X and 46X speed-ups with respect to k-means are observed on the two large-scale datasets WebFace and CelebA, respectively.",

keywords = "Clustering, Computational efficiency, Graph-based, k-means, Machine learning",

author = "Shenfei Pei and Yuanchen Sun and Feiping Nie and Xudong Jiang and Zengwei Zheng",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Ltd",

year = "2025",

month = may,

doi = "10.1016/j.patcog.2024.111226",

language = "英语",

volume = "161",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Adaptive Graph K-Means

AU - Pei, Shenfei

AU - Sun, Yuanchen

AU - Nie, Feiping

AU - Jiang, Xudong

AU - Zheng, Zengwei

PY - 2025/5

Y1 - 2025/5

N2 - Clustering large-scale datasets has received increasing attention recently. However, existing algorithms are still not efficient in scenarios with extremely large number of clusters. To this end, Adaptive Graph K-Means (AGKM) is proposed in this work. Its idea originates from k-means, but it operates on an adaptive k-Nearest Neighbor (k-NN) graph instead of data features. First, AGKM is highly efficient for processing datasets where both the numbers of samples and clusters are very large. Specifically, the time and space complexity are both linear w.r.t the number of samples and, more importantly, independent to the cluster number. Second, AGKM is designed for balanced clusters. This constraint is realized by adding a regularization term in loss function, and a simple modification of the graph in optimization algorithm, which does not increase the computational burden. Last, the indicator and dissimilarity matrices are learned simultaneously, so that the proposed AGKM obtains the final partition directly with higher efficacy and efficiency. Experiments on several datasets validate the advantages of AGKM. In particular, over 29X and 46X speed-ups with respect to k-means are observed on the two large-scale datasets WebFace and CelebA, respectively.

AB - Clustering large-scale datasets has received increasing attention recently. However, existing algorithms are still not efficient in scenarios with extremely large number of clusters. To this end, Adaptive Graph K-Means (AGKM) is proposed in this work. Its idea originates from k-means, but it operates on an adaptive k-Nearest Neighbor (k-NN) graph instead of data features. First, AGKM is highly efficient for processing datasets where both the numbers of samples and clusters are very large. Specifically, the time and space complexity are both linear w.r.t the number of samples and, more importantly, independent to the cluster number. Second, AGKM is designed for balanced clusters. This constraint is realized by adding a regularization term in loss function, and a simple modification of the graph in optimization algorithm, which does not increase the computational burden. Last, the indicator and dissimilarity matrices are learned simultaneously, so that the proposed AGKM obtains the final partition directly with higher efficacy and efficiency. Experiments on several datasets validate the advantages of AGKM. In particular, over 29X and 46X speed-ups with respect to k-means are observed on the two large-scale datasets WebFace and CelebA, respectively.

KW - Clustering

KW - Computational efficiency

KW - Graph-based

KW - k-means

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=85211575636&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2024.111226

DO - 10.1016/j.patcog.2024.111226

M3 - 文章

AN - SCOPUS:85211575636

SN - 0031-3203

VL - 161

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 111226

ER -

Adaptive Graph K-Means

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this