TY - CONF
T1 - Probabilistic non-negative matrix factorization and its robust extensions for topic modeling
AU - Luo, Minnan
AU - Nie, Feiping
AU - Chang, Xiaojun
AU - Yang, Yi
AU - Hauptmann, Alexander
AU - Zheng, Qinghua
N1 - Publisher Copyright:
Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2017
Y1 - 2017
N2 - Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the documents topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic non-negative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using ℓ2, 1 -norm and capped ℓ2, 1 -norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding non-smooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.
AB - Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the documents topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic non-negative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using ℓ2, 1 -norm and capped ℓ2, 1 -norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding non-smooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.
UR - http://www.scopus.com/inward/record.url?scp=85030458565&partnerID=8YFLogxK
M3 - 论文
AN - SCOPUS:85030458565
SP - 2308
EP - 2314
T2 - 31st AAAI Conference on Artificial Intelligence, AAAI 2017
Y2 - 4 February 2017 through 10 February 2017
ER -