Large-Scale Clustering With Structured Optimal Bipartite Graph

Han Zhang; Feiping Nie; Xuelong Li

doi:10.1109/TPAMI.2023.3277532

Large-Scale Clustering With Structured Optimal Bipartite Graph

Han Zhang, Feiping Nie, Xuelong Li

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

22 Scopus citations

Abstract

The widespread arising of data size gives rise to the necessity of undertaking large-scale data clustering tasks. To do so, the bipartite graph theory is frequently applied to design a scalable algorithm, which depicts the relations between samples and a few anchors, instead of binding pairwise samples. However, the bipartite graphs and existing spectral embedding methods ignore the explicit cluster structure learning. They have to obtain cluster labels by using post-processing like K-Means. More than that, existing anchor-based approaches always acquire anchors by using centroids of K-Means or a few random samples, both of which are time-saving but performance-unstable. In this paper, we investigate the scalability, stableness and integration in large-scale graph clustering. We propose a cluster-structured graph learning model, thus obtaining a cc-connected (c c is the cluster number) bipartite graph and also getting discrete labels straightforward. Taking data feature or pairwise relation as a start point, we further design an initialization-independent anchor selection strategy. Experimental results reported for synthetic and real-world datasets demonstrate the proposed method outperforms its peers.

Original language	English
Pages (from-to)	9950-9963
Number of pages	14
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	45
Issue number	8
DOIs	https://doi.org/10.1109/TPAMI.2023.3277532
State	Published - 1 Aug 2023

Keywords

Anchor selection
bipartite graph
discrete labels
large-scale clustering
pairwise relation

Access to Document

10.1109/TPAMI.2023.3277532

Cite this

@article{53eee386594b42c0b4210cc6841abb7f,

title = "Large-Scale Clustering With Structured Optimal Bipartite Graph",

abstract = "The widespread arising of data size gives rise to the necessity of undertaking large-scale data clustering tasks. To do so, the bipartite graph theory is frequently applied to design a scalable algorithm, which depicts the relations between samples and a few anchors, instead of binding pairwise samples. However, the bipartite graphs and existing spectral embedding methods ignore the explicit cluster structure learning. They have to obtain cluster labels by using post-processing like K-Means. More than that, existing anchor-based approaches always acquire anchors by using centroids of K-Means or a few random samples, both of which are time-saving but performance-unstable. In this paper, we investigate the scalability, stableness and integration in large-scale graph clustering. We propose a cluster-structured graph learning model, thus obtaining a cc-connected (c c is the cluster number) bipartite graph and also getting discrete labels straightforward. Taking data feature or pairwise relation as a start point, we further design an initialization-independent anchor selection strategy. Experimental results reported for synthetic and real-world datasets demonstrate the proposed method outperforms its peers.",

keywords = "Anchor selection, bipartite graph, discrete labels, large-scale clustering, pairwise relation",

author = "Han Zhang and Feiping Nie and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2023",

month = aug,

day = "1",

doi = "10.1109/TPAMI.2023.3277532",

language = "英语",

volume = "45",

pages = "9950--9963",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "8",

}

TY - JOUR

T1 - Large-Scale Clustering With Structured Optimal Bipartite Graph

AU - Zhang, Han

AU - Nie, Feiping

AU - Li, Xuelong

PY - 2023/8/1

Y1 - 2023/8/1

N2 - The widespread arising of data size gives rise to the necessity of undertaking large-scale data clustering tasks. To do so, the bipartite graph theory is frequently applied to design a scalable algorithm, which depicts the relations between samples and a few anchors, instead of binding pairwise samples. However, the bipartite graphs and existing spectral embedding methods ignore the explicit cluster structure learning. They have to obtain cluster labels by using post-processing like K-Means. More than that, existing anchor-based approaches always acquire anchors by using centroids of K-Means or a few random samples, both of which are time-saving but performance-unstable. In this paper, we investigate the scalability, stableness and integration in large-scale graph clustering. We propose a cluster-structured graph learning model, thus obtaining a cc-connected (c c is the cluster number) bipartite graph and also getting discrete labels straightforward. Taking data feature or pairwise relation as a start point, we further design an initialization-independent anchor selection strategy. Experimental results reported for synthetic and real-world datasets demonstrate the proposed method outperforms its peers.

AB - The widespread arising of data size gives rise to the necessity of undertaking large-scale data clustering tasks. To do so, the bipartite graph theory is frequently applied to design a scalable algorithm, which depicts the relations between samples and a few anchors, instead of binding pairwise samples. However, the bipartite graphs and existing spectral embedding methods ignore the explicit cluster structure learning. They have to obtain cluster labels by using post-processing like K-Means. More than that, existing anchor-based approaches always acquire anchors by using centroids of K-Means or a few random samples, both of which are time-saving but performance-unstable. In this paper, we investigate the scalability, stableness and integration in large-scale graph clustering. We propose a cluster-structured graph learning model, thus obtaining a cc-connected (c c is the cluster number) bipartite graph and also getting discrete labels straightforward. Taking data feature or pairwise relation as a start point, we further design an initialization-independent anchor selection strategy. Experimental results reported for synthetic and real-world datasets demonstrate the proposed method outperforms its peers.

KW - Anchor selection

KW - bipartite graph

KW - discrete labels

KW - large-scale clustering

KW - pairwise relation

UR - http://www.scopus.com/inward/record.url?scp=85160232256&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2023.3277532

DO - 10.1109/TPAMI.2023.3277532

M3 - 文章

C2 - 37200121

AN - SCOPUS:85160232256

SN - 0162-8828

VL - 45

SP - 9950

EP - 9963

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 8

ER -

Large-Scale Clustering With Structured Optimal Bipartite Graph

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this