跳到主要导航 跳到搜索 跳到主要内容

Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation

  • Wei Feng
  • , Xuecheng Nie
  • , Yujun Zhang
  • , Lei Xie
  • , Jianwu Dang
  • Tianjin University
  • State Administration of Cultural Heritage
  • National University of Singapore
  • Japan Advanced Institute of Science and Technology

科研成果: 期刊稿件文章同行评审

5 引用 (Scopus)

摘要

This paper presents a simple yet effective approach to unsupervisedly measuring Chinese lexical semantic similarity, and shows its promising performance in automatic story segmentation of Mandarin broadcast news. Our approach centers on the unsupervised correlated affinity graph (UCAG) model, which is initialized as a hybrid sparse graph, encoding both explicit word-to-word contextual correlations and latent word-to-character correlations within the given corpus. The UCAG model further diffuses the initial sparse correlations throughout the graph by parallel affinity propagation. This provides us with a dense, reliable, and corpus-specific lexical semantic similarity measure, which comes from purely unlabeled data. We then generalize the classical cosine similarity metric to effectively take soft similarities into account for story segmentation. Extensive experiments on benchmark datasets validate the superiority of the proposed similarity measure over previous measures. We specifically show that our similarity measure averagely helps to achieve 7.7% relative F1-score improvement to the accuracy of state-of-art normalized cuts (NCuts) based story segmentation on two holistic benchmark Mandarin broadcast news corpora, TDT2 and CCTV, and achieves 10.8% relative F1-score improvement on the detailed broadcast news subsets.

源语言英语
页(从-至)236-247
页数12
期刊Neurocomputing
318
DOI
出版状态已出版 - 27 11月 2018

指纹

探究 'Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此