TY - JOUR
T1 - Cauchy balanced nonnegative matrix factorization
AU - Xiong, He
AU - Kong, Deguang
AU - Nie, Feiping
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Nature B.V.
PY - 2023/10
Y1 - 2023/10
N2 - Nonnegative Matrix Factorization (NMF) plays an important role in many data mining and machine learning tasks. Standard NMF uses the Frobenius norm as the loss function which is well-known to be sensitive to noise. To address this issue, we propose a robust formulation of NMF, i.e., Cauchy-NMF, which is derived based on the assumption that the noise generally follows identical independent distributed (i.i.d.) Cauchy distribution. In particular, we derive the Cauchy Balanced NMF model (Cauchy-B-NMF) using Cauchy distribution, where (a) the numerical value of each element in the coefficient matrix is viewed as the posterior probability, which allows the clustering result to be obtained directly from the coefficient matrix without any additional post-processing; (b) a novel manifold regularization term is incorporated into the loss function, explicitly making the distant data points have dissimilar embeddings, while implicitly making the neighbouring data points have similar embeddings; (c) a balanced clustering term is enforced to achieve the desired equal number of data points across different clusters. We derive an efficient computational algorithm to solve the resultant optimization problem, and also provide a rigorous analysis of the algorithm convergence. Experimental results on several benchmarks demonstrate the effectiveness of our algorithms, which consistently provides better clustering results compared to many other NMF variants.
AB - Nonnegative Matrix Factorization (NMF) plays an important role in many data mining and machine learning tasks. Standard NMF uses the Frobenius norm as the loss function which is well-known to be sensitive to noise. To address this issue, we propose a robust formulation of NMF, i.e., Cauchy-NMF, which is derived based on the assumption that the noise generally follows identical independent distributed (i.i.d.) Cauchy distribution. In particular, we derive the Cauchy Balanced NMF model (Cauchy-B-NMF) using Cauchy distribution, where (a) the numerical value of each element in the coefficient matrix is viewed as the posterior probability, which allows the clustering result to be obtained directly from the coefficient matrix without any additional post-processing; (b) a novel manifold regularization term is incorporated into the loss function, explicitly making the distant data points have dissimilar embeddings, while implicitly making the neighbouring data points have similar embeddings; (c) a balanced clustering term is enforced to achieve the desired equal number of data points across different clusters. We derive an efficient computational algorithm to solve the resultant optimization problem, and also provide a rigorous analysis of the algorithm convergence. Experimental results on several benchmarks demonstrate the effectiveness of our algorithms, which consistently provides better clustering results compared to many other NMF variants.
KW - Balanced
KW - Cauchy
KW - Clustering
KW - NMF
KW - Posterior probabilistic
KW - Robust
UR - http://www.scopus.com/inward/record.url?scp=85150647105&partnerID=8YFLogxK
U2 - 10.1007/s10462-022-10379-y
DO - 10.1007/s10462-022-10379-y
M3 - 文章
AN - SCOPUS:85150647105
SN - 0269-2821
VL - 56
SP - 11867
EP - 11903
JO - Artificial Intelligence Review
JF - Artificial Intelligence Review
IS - 10
ER -