Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation

Wei Feng, Xuecheng Nie, Yujun Zhang, Lei Xie, Jianwu Dang

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

This paper presents a simple yet effective approach to unsupervisedly measuring Chinese lexical semantic similarity, and shows its promising performance in automatic story segmentation of Mandarin broadcast news. Our approach centers on the unsupervised correlated affinity graph (UCAG) model, which is initialized as a hybrid sparse graph, encoding both explicit word-to-word contextual correlations and latent word-to-character correlations within the given corpus. The UCAG model further diffuses the initial sparse correlations throughout the graph by parallel affinity propagation. This provides us with a dense, reliable, and corpus-specific lexical semantic similarity measure, which comes from purely unlabeled data. We then generalize the classical cosine similarity metric to effectively take soft similarities into account for story segmentation. Extensive experiments on benchmark datasets validate the superiority of the proposed similarity measure over previous measures. We specifically show that our similarity measure averagely helps to achieve 7.7% relative F1-score improvement to the accuracy of state-of-art normalized cuts (NCuts) based story segmentation on two holistic benchmark Mandarin broadcast news corpora, TDT2 and CCTV, and achieves 10.8% relative F1-score improvement on the detailed broadcast news subsets.

Original languageEnglish
Pages (from-to)236-247
Number of pages12
JournalNeurocomputing
Volume318
DOIs
StatePublished - 27 Nov 2018

Keywords

  • Common character correlation
  • Contextual correlation
  • Generalized cosine similarity
  • Parallel affinity propagation
  • Story segmentation
  • Unsupervised correlated affinity graph (UCAG) model

Fingerprint

Dive into the research topics of 'Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation'. Together they form a unique fingerprint.

Cite this