Measuring semantic similarity by contextualword connections in Chinese news story segmentation

Xuecheng Nie, Wei Feng, Liang Wan, Lei Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

A lot of recent work in story segmentation focuses on developing better partitioning criteria to segment news transcripts into sequences of topically coherent stories, while simply relying on the repetition based hard word-level similarities and ignoring the semantic correlations between different words. In this paper, we propose a purely data-driven approach to measuring soft semantic word- and sentence-level similarity from a given corpus, without the guidance of linguistic knowledge, ground-truth topic labeling or story boundaries. We show that contextual word connections can help to produce semantically meaningful similarity measurement between any pair of Chinese words. Based on this, we further use a parallel all-pair SimRank algorithm to propagate such contextual similarities throughout the whole vocabulary. The resultant word semantic similarity matrix is then used to refine the classical cosine similarity measurement of sentences. Experiments on benchmark Chinese news corpora show that, story segmentation using the proposed soft semantic similarity measurement can always produce better segmentation accuracy than using the hard similarity. Specifically, we can achieve 3%-10% average F1-measure improvement to state-of-the-art NCuts based story segmentation.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages8312-8316
Number of pages5
DOIs
StatePublished - 18 Oct 2013
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: 26 May 201331 May 2013

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Country/TerritoryCanada
CityVancouver, BC
Period26/05/1331/05/13

Keywords

  • contextual word connections
  • Semantic similarity
  • similarity propagation
  • story segmentation

Fingerprint

Dive into the research topics of 'Measuring semantic similarity by contextualword connections in Chinese news story segmentation'. Together they form a unique fingerprint.

Cite this