Sensitive information recognition based on short text sentiment analysis

Yang Li, Quan Pan, Tao Yang

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

The existing sensitive information recognition is based on the sensitive keyword matching method, so the accuracy is low and the miss rate is high. We presented a collaborative method by using the sensitive keywords and sentiment polarities to identify the sensitive information. In the real dataset, we used the supervised way to measure the sentiment polarities of the blogs, and divided the blogs into two categories, namely the blogs are with positive or negative sentiment polarities. Five kinds of 2 639 sensitive keywords, including pornography, violence, illegality, cult and reactionary, were defined, and it was found that according to the Zipf distribution of these words in the dataset, the contents of blogs with negative sentiment polarities exhibited high sensitivities. Then we studied the contribution of the sensitive keywords to the sentiment polarity, and constructed the model of sensitivity degree that contains the sentiment polarity factor. Based on this, we proposed a new way to identify the sensitive information, which makes the accuracy and miss rate improved from 31.25% to 58.75% and from 95% to 96%, respectively, and the F-measure was improved from 47.0%to 72.3%.

Original languageEnglish
Pages (from-to)80-84
Number of pages5
JournalHsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University
Volume50
Issue number9
DOIs
StatePublished - 10 Sep 2016

Keywords

  • Sensitive information
  • Sentiment analysis
  • Social networks

Fingerprint

Dive into the research topics of 'Sensitive information recognition based on short text sentiment analysis'. Together they form a unique fingerprint.

Cite this