Probabilistic non-negative matrix factorization and its robust extensions for topic modeling

Minnan Luo; Feiping Nie; Xiaojun Chang; Yi Yang; Alexander Hauptmann; Qinghua Zheng

Probabilistic non-negative matrix factorization and its robust extensions for topic modeling

Minnan Luo, Feiping Nie, Xiaojun Chang, Yi Yang, Alexander Hauptmann, Qinghua Zheng

光电与智能研究院

科研成果: 会议稿件 › 论文 › 同行评审

29 引用（Scopus）

摘要

Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the documents topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic non-negative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using ℓ2, 1 -norm and capped ℓ2, 1 -norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding non-smooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.

源语言	英语
页	2308-2314
页数	7
出版状态	已出版 - 2017
活动	31st AAAI Conference on Artificial Intelligence, AAAI 2017 - San Francisco, 美国期限: 4 2月 2017 → 10 2月 2017

会议

会议	31st AAAI Conference on Artificial Intelligence, AAAI 2017
国家/地区	美国
市	San Francisco
时期	4/02/17 → 10/02/17

其它文件与链接

链接到 Scopus 的出版物

引用此

@conference{3d38d81778ab4fc69e1a363b452e4a59,

title = "Probabilistic non-negative matrix factorization and its robust extensions for topic modeling",

abstract = "Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the documents topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic non-negative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using ℓ2, 1 -norm and capped ℓ2, 1 -norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding non-smooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.",

author = "Minnan Luo and Feiping Nie and Xiaojun Chang and Yi Yang and Alexander Hauptmann and Qinghua Zheng",

note = "Publisher Copyright: Copyright {\textcopyright} 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 31st AAAI Conference on Artificial Intelligence, AAAI 2017 ; Conference date: 04-02-2017 Through 10-02-2017",

year = "2017",

language = "英语",

pages = "2308--2314",

}

TY - CONF

T1 - Probabilistic non-negative matrix factorization and its robust extensions for topic modeling

AU - Luo, Minnan

AU - Nie, Feiping

AU - Chang, Xiaojun

AU - Yang, Yi

AU - Hauptmann, Alexander

AU - Zheng, Qinghua

PY - 2017

Y1 - 2017

N2 - Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the documents topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic non-negative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using ℓ2, 1 -norm and capped ℓ2, 1 -norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding non-smooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.

AB - Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the documents topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic non-negative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using ℓ2, 1 -norm and capped ℓ2, 1 -norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding non-smooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.

UR - http://www.scopus.com/inward/record.url?scp=85030458565&partnerID=8YFLogxK

M3 - 论文

AN - SCOPUS:85030458565

SP - 2308

EP - 2314

T2 - 31st AAAI Conference on Artificial Intelligence, AAAI 2017

Y2 - 4 February 2017 through 10 February 2017

ER -

Probabilistic non-negative matrix factorization and its robust extensions for topic modeling

摘要

会议

其它文件与链接

指纹

引用此