A LDA model based topic detection method

Lantian Guo, Yang Li, Dejun Mu, Tao Yang, Zhe Li

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Topic Detection is one of the most important techniques in hot topic extraction and evolution tracking. Due to the high dimensionality problem which hinders processing efficiency and topics mal-distribution problem which makes topics unclear, it is difficult to detect topics from a large number of short texts in social network. To address these challenges, we proposed a new LDA (Latent Dirichlet Allocation) model based topic detection method called CBOW-LDA topic modeling method. It utilizes a CBOW(Continuous Bag-of-Word) method to cluster the words, which generate word vectors and clustering by vectors similarity. This method decreases the dimensions of LDA output, and makes topic more clearly. Through the analysis of topic perplexity in the real-world dataset, it is obvious that topics detected by our method has a lower perplexity, comparing with word frequency weighing based vectors. In a condition of same number of topic words, perplexity is reduced by about 3%.

Original languageEnglish
Pages (from-to)698-702
Number of pages5
JournalXibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
Volume34
Issue number4
StatePublished - 1 Aug 2016

Keywords

  • LDA model
  • Perplexity
  • Topic detection
  • Word vectors

Fingerprint

Dive into the research topics of 'A LDA model based topic detection method'. Together they form a unique fingerprint.

Cite this