Laplacian eigenmaps for automatic story segmentation of broadcast news

Lei Xie; Lilei Zheng; Zihan Liu; Yanning Zhang

doi:10.1109/TASL.2011.2160853

Laplacian eigenmaps for automatic story segmentation of broadcast news

Lei Xie, Lilei Zheng, Zihan Liu, Yanning Zhang

School of Computer Science

Research output: Contribution to journal › Article › peer-review

34 Scopus citations

Abstract

We propose Laplacian Eigenmaps (LE)-based approaches to automatic story segmentation on speech recognition transcripts of broadcast news. We reinforce story boundaries by applying LE analysis to sentence connective strength matrix and reveal the intrinsic geometric structure of stories. Specifically, we construct a Euclidean space in which each sentence is mapped to a vector. As a result, the original inter-sentence connective strength is reflected by the Euclidean distances between the corresponding vectors and cohesive relations between sentences become geometrically evident. Taking advantage of LE, we present three story segmentation approaches: LE-TextTiling, spectral clustering and LE-DP. In LE-DP, we formalize story segmentation as a straightforward criterion minimization problem and give a fast dynamic programming solution to it. Extensive story segmentation experiments on three corpora demonstrate that the proposed LE-based approaches achieve superior performances and significantly outperform several state-of-the-art methods. For instance, LE-TextTiling obtains a relative F1-measure increase of 17.8% on CCTV Mandarin BN corpus as compared to conventional TextTiling; LE-DP achieves a high F1-measure of 0.7460, which significantly outperforms a recent CRF-prosody approach with an F1-measure of 0.6783 on TDT2 Mandarin BN corpus.

Original language	English
Article number	5934585
Pages (from-to)	276-289
Number of pages	14
Journal	IEEE Transactions on Audio, Speech and Language Processing
Volume	20
Issue number	1
DOIs	https://doi.org/10.1109/TASL.2011.2160853
State	Published - 2012

Keywords

Laplacian Eigenmaps (LE)
spoken document retrieval
story segmentation
topic segmentation

Access to Document

10.1109/TASL.2011.2160853

Cite this

@article{9478ce3cd64e496ea1c9b329f838dbd0,

title = "Laplacian eigenmaps for automatic story segmentation of broadcast news",

abstract = "We propose Laplacian Eigenmaps (LE)-based approaches to automatic story segmentation on speech recognition transcripts of broadcast news. We reinforce story boundaries by applying LE analysis to sentence connective strength matrix and reveal the intrinsic geometric structure of stories. Specifically, we construct a Euclidean space in which each sentence is mapped to a vector. As a result, the original inter-sentence connective strength is reflected by the Euclidean distances between the corresponding vectors and cohesive relations between sentences become geometrically evident. Taking advantage of LE, we present three story segmentation approaches: LE-TextTiling, spectral clustering and LE-DP. In LE-DP, we formalize story segmentation as a straightforward criterion minimization problem and give a fast dynamic programming solution to it. Extensive story segmentation experiments on three corpora demonstrate that the proposed LE-based approaches achieve superior performances and significantly outperform several state-of-the-art methods. For instance, LE-TextTiling obtains a relative F1-measure increase of 17.8% on CCTV Mandarin BN corpus as compared to conventional TextTiling; LE-DP achieves a high F1-measure of 0.7460, which significantly outperforms a recent CRF-prosody approach with an F1-measure of 0.6783 on TDT2 Mandarin BN corpus.",

keywords = "Laplacian Eigenmaps (LE), spoken document retrieval, story segmentation, topic segmentation",

author = "Lei Xie and Lilei Zheng and Zihan Liu and Yanning Zhang",

year = "2012",

doi = "10.1109/TASL.2011.2160853",

language = "英语",

volume = "20",

pages = "276--289",

journal = "IEEE Transactions on Audio, Speech and Language Processing",

issn = "1558-7916",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "1",

}

TY - JOUR

T1 - Laplacian eigenmaps for automatic story segmentation of broadcast news

AU - Xie, Lei

AU - Zheng, Lilei

AU - Liu, Zihan

AU - Zhang, Yanning

PY - 2012

Y1 - 2012

N2 - We propose Laplacian Eigenmaps (LE)-based approaches to automatic story segmentation on speech recognition transcripts of broadcast news. We reinforce story boundaries by applying LE analysis to sentence connective strength matrix and reveal the intrinsic geometric structure of stories. Specifically, we construct a Euclidean space in which each sentence is mapped to a vector. As a result, the original inter-sentence connective strength is reflected by the Euclidean distances between the corresponding vectors and cohesive relations between sentences become geometrically evident. Taking advantage of LE, we present three story segmentation approaches: LE-TextTiling, spectral clustering and LE-DP. In LE-DP, we formalize story segmentation as a straightforward criterion minimization problem and give a fast dynamic programming solution to it. Extensive story segmentation experiments on three corpora demonstrate that the proposed LE-based approaches achieve superior performances and significantly outperform several state-of-the-art methods. For instance, LE-TextTiling obtains a relative F1-measure increase of 17.8% on CCTV Mandarin BN corpus as compared to conventional TextTiling; LE-DP achieves a high F1-measure of 0.7460, which significantly outperforms a recent CRF-prosody approach with an F1-measure of 0.6783 on TDT2 Mandarin BN corpus.

AB - We propose Laplacian Eigenmaps (LE)-based approaches to automatic story segmentation on speech recognition transcripts of broadcast news. We reinforce story boundaries by applying LE analysis to sentence connective strength matrix and reveal the intrinsic geometric structure of stories. Specifically, we construct a Euclidean space in which each sentence is mapped to a vector. As a result, the original inter-sentence connective strength is reflected by the Euclidean distances between the corresponding vectors and cohesive relations between sentences become geometrically evident. Taking advantage of LE, we present three story segmentation approaches: LE-TextTiling, spectral clustering and LE-DP. In LE-DP, we formalize story segmentation as a straightforward criterion minimization problem and give a fast dynamic programming solution to it. Extensive story segmentation experiments on three corpora demonstrate that the proposed LE-based approaches achieve superior performances and significantly outperform several state-of-the-art methods. For instance, LE-TextTiling obtains a relative F1-measure increase of 17.8% on CCTV Mandarin BN corpus as compared to conventional TextTiling; LE-DP achieves a high F1-measure of 0.7460, which significantly outperforms a recent CRF-prosody approach with an F1-measure of 0.6783 on TDT2 Mandarin BN corpus.

KW - Laplacian Eigenmaps (LE)

KW - spoken document retrieval

KW - story segmentation

KW - topic segmentation

UR - http://www.scopus.com/inward/record.url?scp=81355141642&partnerID=8YFLogxK

U2 - 10.1109/TASL.2011.2160853

DO - 10.1109/TASL.2011.2160853

M3 - 文章

AN - SCOPUS:81355141642

SN - 1558-7916

VL - 20

SP - 276

EP - 289

JO - IEEE Transactions on Audio, Speech and Language Processing

JF - IEEE Transactions on Audio, Speech and Language Processing

IS - 1

M1 - 5934585

ER -

Laplacian eigenmaps for automatic story segmentation of broadcast news

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this