Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation

Wei Feng; Xuecheng Nie; Yujun Zhang; Lei Xie; Jianwu Dang

doi:10.1016/j.neucom.2018.08.061

Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation

Wei Feng, Xuecheng Nie, Yujun Zhang, Lei Xie, Jianwu Dang

School of Computer Science

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

This paper presents a simple yet effective approach to unsupervisedly measuring Chinese lexical semantic similarity, and shows its promising performance in automatic story segmentation of Mandarin broadcast news. Our approach centers on the unsupervised correlated affinity graph (UCAG) model, which is initialized as a hybrid sparse graph, encoding both explicit word-to-word contextual correlations and latent word-to-character correlations within the given corpus. The UCAG model further diffuses the initial sparse correlations throughout the graph by parallel affinity propagation. This provides us with a dense, reliable, and corpus-specific lexical semantic similarity measure, which comes from purely unlabeled data. We then generalize the classical cosine similarity metric to effectively take soft similarities into account for story segmentation. Extensive experiments on benchmark datasets validate the superiority of the proposed similarity measure over previous measures. We specifically show that our similarity measure averagely helps to achieve 7.7% relative F1-score improvement to the accuracy of state-of-art normalized cuts (NCuts) based story segmentation on two holistic benchmark Mandarin broadcast news corpora, TDT2 and CCTV, and achieves 10.8% relative F1-score improvement on the detailed broadcast news subsets.

Original language	English
Pages (from-to)	236-247
Number of pages	12
Journal	Neurocomputing
Volume	318
DOIs	https://doi.org/10.1016/j.neucom.2018.08.061
State	Published - 27 Nov 2018

Keywords

Common character correlation
Contextual correlation
Generalized cosine similarity
Parallel affinity propagation
Story segmentation
Unsupervised correlated affinity graph (UCAG) model

Access to Document

10.1016/j.neucom.2018.08.061

Cite this

@article{654db722933c441880bc47aa55f4d00a,

title = "Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation",

abstract = "This paper presents a simple yet effective approach to unsupervisedly measuring Chinese lexical semantic similarity, and shows its promising performance in automatic story segmentation of Mandarin broadcast news. Our approach centers on the unsupervised correlated affinity graph (UCAG) model, which is initialized as a hybrid sparse graph, encoding both explicit word-to-word contextual correlations and latent word-to-character correlations within the given corpus. The UCAG model further diffuses the initial sparse correlations throughout the graph by parallel affinity propagation. This provides us with a dense, reliable, and corpus-specific lexical semantic similarity measure, which comes from purely unlabeled data. We then generalize the classical cosine similarity metric to effectively take soft similarities into account for story segmentation. Extensive experiments on benchmark datasets validate the superiority of the proposed similarity measure over previous measures. We specifically show that our similarity measure averagely helps to achieve 7.7% relative F1-score improvement to the accuracy of state-of-art normalized cuts (NCuts) based story segmentation on two holistic benchmark Mandarin broadcast news corpora, TDT2 and CCTV, and achieves 10.8% relative F1-score improvement on the detailed broadcast news subsets.",

keywords = "Common character correlation, Contextual correlation, Generalized cosine similarity, Parallel affinity propagation, Story segmentation, Unsupervised correlated affinity graph (UCAG) model",

author = "Wei Feng and Xuecheng Nie and Yujun Zhang and Lei Xie and Jianwu Dang",

note = "Publisher Copyright: {\textcopyright} 2018 Elsevier B.V.",

year = "2018",

month = nov,

day = "27",

doi = "10.1016/j.neucom.2018.08.061",

language = "英语",

volume = "318",

pages = "236--247",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation

AU - Feng, Wei

AU - Nie, Xuecheng

AU - Zhang, Yujun

AU - Xie, Lei

AU - Dang, Jianwu

PY - 2018/11/27

Y1 - 2018/11/27

N2 - This paper presents a simple yet effective approach to unsupervisedly measuring Chinese lexical semantic similarity, and shows its promising performance in automatic story segmentation of Mandarin broadcast news. Our approach centers on the unsupervised correlated affinity graph (UCAG) model, which is initialized as a hybrid sparse graph, encoding both explicit word-to-word contextual correlations and latent word-to-character correlations within the given corpus. The UCAG model further diffuses the initial sparse correlations throughout the graph by parallel affinity propagation. This provides us with a dense, reliable, and corpus-specific lexical semantic similarity measure, which comes from purely unlabeled data. We then generalize the classical cosine similarity metric to effectively take soft similarities into account for story segmentation. Extensive experiments on benchmark datasets validate the superiority of the proposed similarity measure over previous measures. We specifically show that our similarity measure averagely helps to achieve 7.7% relative F1-score improvement to the accuracy of state-of-art normalized cuts (NCuts) based story segmentation on two holistic benchmark Mandarin broadcast news corpora, TDT2 and CCTV, and achieves 10.8% relative F1-score improvement on the detailed broadcast news subsets.

AB - This paper presents a simple yet effective approach to unsupervisedly measuring Chinese lexical semantic similarity, and shows its promising performance in automatic story segmentation of Mandarin broadcast news. Our approach centers on the unsupervised correlated affinity graph (UCAG) model, which is initialized as a hybrid sparse graph, encoding both explicit word-to-word contextual correlations and latent word-to-character correlations within the given corpus. The UCAG model further diffuses the initial sparse correlations throughout the graph by parallel affinity propagation. This provides us with a dense, reliable, and corpus-specific lexical semantic similarity measure, which comes from purely unlabeled data. We then generalize the classical cosine similarity metric to effectively take soft similarities into account for story segmentation. Extensive experiments on benchmark datasets validate the superiority of the proposed similarity measure over previous measures. We specifically show that our similarity measure averagely helps to achieve 7.7% relative F1-score improvement to the accuracy of state-of-art normalized cuts (NCuts) based story segmentation on two holistic benchmark Mandarin broadcast news corpora, TDT2 and CCTV, and achieves 10.8% relative F1-score improvement on the detailed broadcast news subsets.

KW - Common character correlation

KW - Contextual correlation

KW - Generalized cosine similarity

KW - Parallel affinity propagation

KW - Story segmentation

KW - Unsupervised correlated affinity graph (UCAG) model

UR - http://www.scopus.com/inward/record.url?scp=85053163510&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2018.08.061

DO - 10.1016/j.neucom.2018.08.061

M3 - 文章

AN - SCOPUS:85053163510

SN - 0925-2312

VL - 318

SP - 236

EP - 247

JO - Neurocomputing

JF - Neurocomputing

ER -

Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this