Measuring semantic similarity by contextualword connections in Chinese news story segmentation

Xuecheng Nie; Wei Feng; Liang Wan; Lei Xie

doi:10.1109/ICASSP.2013.6639286

Measuring semantic similarity by contextualword connections in Chinese news story segmentation

Xuecheng Nie, Wei Feng, Liang Wan, Lei Xie

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

9 Scopus citations

Abstract

A lot of recent work in story segmentation focuses on developing better partitioning criteria to segment news transcripts into sequences of topically coherent stories, while simply relying on the repetition based hard word-level similarities and ignoring the semantic correlations between different words. In this paper, we propose a purely data-driven approach to measuring soft semantic word- and sentence-level similarity from a given corpus, without the guidance of linguistic knowledge, ground-truth topic labeling or story boundaries. We show that contextual word connections can help to produce semantically meaningful similarity measurement between any pair of Chinese words. Based on this, we further use a parallel all-pair SimRank algorithm to propagate such contextual similarities throughout the whole vocabulary. The resultant word semantic similarity matrix is then used to refine the classical cosine similarity measurement of sentences. Experiments on benchmark Chinese news corpora show that, story segmentation using the proposed soft semantic similarity measurement can always produce better segmentation accuracy than using the hard similarity. Specifically, we can achieve 3%-10% average F1-measure improvement to state-of-the-art NCuts based story segmentation.

Original language	English
Title of host publication	2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages	8312-8316
Number of pages	5
DOIs	https://doi.org/10.1109/ICASSP.2013.6639286
State	Published - 18 Oct 2013
Event	2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada Duration: 26 May 2013 → 31 May 2013

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)	1520-6149

Conference

Conference	2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Country/Territory	Canada
City	Vancouver, BC
Period	26/05/13 → 31/05/13

Keywords

contextual word connections
Semantic similarity
similarity propagation
story segmentation

Access to Document

10.1109/ICASSP.2013.6639286

Cite this

Nie, X., Feng, W., Wan, L., & Xie, L. (2013). Measuring semantic similarity by contextualword connections in Chinese news story segmentation. In 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings (pp. 8312-8316). Article 6639286 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2013.6639286

@inproceedings{6bbee0b3f7c646fdbb78f072b0e9d46d,

title = "Measuring semantic similarity by contextualword connections in Chinese news story segmentation",

abstract = "A lot of recent work in story segmentation focuses on developing better partitioning criteria to segment news transcripts into sequences of topically coherent stories, while simply relying on the repetition based hard word-level similarities and ignoring the semantic correlations between different words. In this paper, we propose a purely data-driven approach to measuring soft semantic word- and sentence-level similarity from a given corpus, without the guidance of linguistic knowledge, ground-truth topic labeling or story boundaries. We show that contextual word connections can help to produce semantically meaningful similarity measurement between any pair of Chinese words. Based on this, we further use a parallel all-pair SimRank algorithm to propagate such contextual similarities throughout the whole vocabulary. The resultant word semantic similarity matrix is then used to refine the classical cosine similarity measurement of sentences. Experiments on benchmark Chinese news corpora show that, story segmentation using the proposed soft semantic similarity measurement can always produce better segmentation accuracy than using the hard similarity. Specifically, we can achieve 3%-10% average F1-measure improvement to state-of-the-art NCuts based story segmentation.",

keywords = "contextual word connections, Semantic similarity, similarity propagation, story segmentation",

author = "Xuecheng Nie and Wei Feng and Liang Wan and Lei Xie",

year = "2013",

month = oct,

day = "18",

doi = "10.1109/ICASSP.2013.6639286",

language = "英语",

isbn = "9781479903566",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

pages = "8312--8316",

booktitle = "2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings",

note = "2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 ; Conference date: 26-05-2013 Through 31-05-2013",

}

Nie, X, Feng, W, Wan, L & Xie, L 2013, Measuring semantic similarity by contextualword connections in Chinese news story segmentation. in 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings., 6639286, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 8312-8316, 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 26/05/13. https://doi.org/10.1109/ICASSP.2013.6639286

Measuring semantic similarity by contextualword connections in Chinese news story segmentation. / Nie, Xuecheng; Feng, Wei; Wan, Liang et al.
2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. p. 8312-8316 6639286 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Measuring semantic similarity by contextualword connections in Chinese news story segmentation

AU - Nie, Xuecheng

AU - Feng, Wei

AU - Wan, Liang

AU - Xie, Lei

PY - 2013/10/18

Y1 - 2013/10/18

N2 - A lot of recent work in story segmentation focuses on developing better partitioning criteria to segment news transcripts into sequences of topically coherent stories, while simply relying on the repetition based hard word-level similarities and ignoring the semantic correlations between different words. In this paper, we propose a purely data-driven approach to measuring soft semantic word- and sentence-level similarity from a given corpus, without the guidance of linguistic knowledge, ground-truth topic labeling or story boundaries. We show that contextual word connections can help to produce semantically meaningful similarity measurement between any pair of Chinese words. Based on this, we further use a parallel all-pair SimRank algorithm to propagate such contextual similarities throughout the whole vocabulary. The resultant word semantic similarity matrix is then used to refine the classical cosine similarity measurement of sentences. Experiments on benchmark Chinese news corpora show that, story segmentation using the proposed soft semantic similarity measurement can always produce better segmentation accuracy than using the hard similarity. Specifically, we can achieve 3%-10% average F1-measure improvement to state-of-the-art NCuts based story segmentation.

AB - A lot of recent work in story segmentation focuses on developing better partitioning criteria to segment news transcripts into sequences of topically coherent stories, while simply relying on the repetition based hard word-level similarities and ignoring the semantic correlations between different words. In this paper, we propose a purely data-driven approach to measuring soft semantic word- and sentence-level similarity from a given corpus, without the guidance of linguistic knowledge, ground-truth topic labeling or story boundaries. We show that contextual word connections can help to produce semantically meaningful similarity measurement between any pair of Chinese words. Based on this, we further use a parallel all-pair SimRank algorithm to propagate such contextual similarities throughout the whole vocabulary. The resultant word semantic similarity matrix is then used to refine the classical cosine similarity measurement of sentences. Experiments on benchmark Chinese news corpora show that, story segmentation using the proposed soft semantic similarity measurement can always produce better segmentation accuracy than using the hard similarity. Specifically, we can achieve 3%-10% average F1-measure improvement to state-of-the-art NCuts based story segmentation.

KW - contextual word connections

KW - Semantic similarity

KW - similarity propagation

KW - story segmentation

UR - http://www.scopus.com/inward/record.url?scp=84890497890&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2013.6639286

DO - 10.1109/ICASSP.2013.6639286

M3 - 会议稿件

AN - SCOPUS:84890497890

SN - 9781479903566

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 8312

EP - 8316

BT - 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings

T2 - 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013

Y2 - 26 May 2013 through 31 May 2013

ER -

Nie X, Feng W, Wan L, Xie L. Measuring semantic similarity by contextualword connections in Chinese news story segmentation. In 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. p. 8312-8316. 6639286. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2013.6639286

Measuring semantic similarity by contextualword connections in Chinese news story segmentation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this