Sensitive information recognition based on short text sentiment analysis

Yang Li, Quan Pan, Tao Yang

科研成果: 期刊稿件文章同行评审

5 引用 (Scopus)

摘要

The existing sensitive information recognition is based on the sensitive keyword matching method, so the accuracy is low and the miss rate is high. We presented a collaborative method by using the sensitive keywords and sentiment polarities to identify the sensitive information. In the real dataset, we used the supervised way to measure the sentiment polarities of the blogs, and divided the blogs into two categories, namely the blogs are with positive or negative sentiment polarities. Five kinds of 2 639 sensitive keywords, including pornography, violence, illegality, cult and reactionary, were defined, and it was found that according to the Zipf distribution of these words in the dataset, the contents of blogs with negative sentiment polarities exhibited high sensitivities. Then we studied the contribution of the sensitive keywords to the sentiment polarity, and constructed the model of sensitivity degree that contains the sentiment polarity factor. Based on this, we proposed a new way to identify the sensitive information, which makes the accuracy and miss rate improved from 31.25% to 58.75% and from 95% to 96%, respectively, and the F-measure was improved from 47.0%to 72.3%.

源语言英语
页(从-至)80-84
页数5
期刊Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University
50
9
DOI
出版状态已出版 - 10 9月 2016

指纹

探究 'Sensitive information recognition based on short text sentiment analysis' 的科研主题。它们共同构成独一无二的指纹。

引用此