TY - JOUR
T1 - An efficient semi-supervised balanced cut with hard pairwise constraints and partial labels
AU - Yu, Weizhong
AU - Xing, Liyin
AU - Nie, Feiping
AU - Li, Xuelong
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/9/27
Y1 - 2023/9/27
N2 - With a small scale of prior information, including pairwise constraints and partial labels, semi-supervised clustering algorithms can improve clustering performance by imposing certain restrictions on the clustering results, and have accordingly attracted a lot of attention in recent years. Moreover, a large number of graph-based semi-supervised clustering algorithms have been proposed, since graphs are naturally suited to pairwise constraints. However, most graph-based approaches involve a two-step process and are not scalable to large-scale datasets due to their high time complexity. Additionally, they also encounter a serious problem of constraint violation. To deal with this issue, we propose a new non-parametric model to directly obtain the indicator matrix by minimizing the cut of the graph with a regularization term that corresponds to the balance of the clusters. To simultaneously support the two forms of prior information, our approach restricts the indicator matrix during optimization, while also ensuring that the clusters obtained by our approach do not violate any constraints. The computational complexity of the proposed approach is nearly linear to the data size n. Experiments show that the algorithm proposed in this paper achieves significant improvements in both clustering performance and efficiency.
AB - With a small scale of prior information, including pairwise constraints and partial labels, semi-supervised clustering algorithms can improve clustering performance by imposing certain restrictions on the clustering results, and have accordingly attracted a lot of attention in recent years. Moreover, a large number of graph-based semi-supervised clustering algorithms have been proposed, since graphs are naturally suited to pairwise constraints. However, most graph-based approaches involve a two-step process and are not scalable to large-scale datasets due to their high time complexity. Additionally, they also encounter a serious problem of constraint violation. To deal with this issue, we propose a new non-parametric model to directly obtain the indicator matrix by minimizing the cut of the graph with a regularization term that corresponds to the balance of the clusters. To simultaneously support the two forms of prior information, our approach restricts the indicator matrix during optimization, while also ensuring that the clusters obtained by our approach do not violate any constraints. The computational complexity of the proposed approach is nearly linear to the data size n. Experiments show that the algorithm proposed in this paper achieves significant improvements in both clustering performance and efficiency.
KW - Graph-based clustering
KW - Pairwise constraints
KW - Semi-supervised clustering
UR - http://www.scopus.com/inward/record.url?scp=85165227364&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2023.110747
DO - 10.1016/j.knosys.2023.110747
M3 - 文章
AN - SCOPUS:85165227364
SN - 0950-7051
VL - 276
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 110747
ER -