A deep neural network approach for sentence boundary detection in broadcast news

Chenglin Xu; Lei Xie; Guangpu Huang; Xiong Xiao; Eng Siong Chng; Haizhou Li

A deep neural network approach for sentence boundary detection in broadcast news

Chenglin Xu, Lei Xie, Guangpu Huang, Xiong Xiao, Eng Siong Chng, Haizhou Li

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

37 引用（Scopus）

摘要

This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field) architecture. The DNN accepts prosodic feature inputs and non-linearly maps them into boundary/non-boundary posterior probability outputs. Subsequently, the posterior probabilities are combined with lexical features and the integrated features are modeled by a linear-chain CRF. The CRF finally labels the inter-word positions as boundary or non-boundary by Viterbi decoding. Experiments show that, as compared with the state-of-the-art DTCRF approach [1], the proposed DNN-CRF approach achieves 16.7% and 4.1% reduction in NIST boundary detection error in reference and speech recognition transcripts, respectively.

源语言	英语
页（从-至）	2887-2891
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
出版状态	已出版 - 2014
活动	15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, 新加坡期限: 14 9月 2014 → 18 9月 2014

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{add4d01ca90b4228ac306993fdfed39b,

title = "A deep neural network approach for sentence boundary detection in broadcast news",

abstract = "This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field) architecture. The DNN accepts prosodic feature inputs and non-linearly maps them into boundary/non-boundary posterior probability outputs. Subsequently, the posterior probabilities are combined with lexical features and the integrated features are modeled by a linear-chain CRF. The CRF finally labels the inter-word positions as boundary or non-boundary by Viterbi decoding. Experiments show that, as compared with the state-of-the-art DTCRF approach [1], the proposed DNN-CRF approach achieves 16.7% and 4.1% reduction in NIST boundary detection error in reference and speech recognition transcripts, respectively.",

keywords = "Deep neural network, Rich transcription, Sentence boundary detection, Structural event detection",

author = "Chenglin Xu and Lei Xie and Guangpu Huang and Xiong Xiao and Chng, {Eng Siong} and Haizhou Li",

note = "Publisher Copyright: Copyright {\textcopyright} 2014 ISCA.; 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 ; Conference date: 14-09-2014 Through 18-09-2014",

year = "2014",

language = "英语",

pages = "2887--2891",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - A deep neural network approach for sentence boundary detection in broadcast news

AU - Xu, Chenglin

AU - Xie, Lei

AU - Huang, Guangpu

AU - Xiao, Xiong

AU - Chng, Eng Siong

AU - Li, Haizhou

PY - 2014

Y1 - 2014

N2 - This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field) architecture. The DNN accepts prosodic feature inputs and non-linearly maps them into boundary/non-boundary posterior probability outputs. Subsequently, the posterior probabilities are combined with lexical features and the integrated features are modeled by a linear-chain CRF. The CRF finally labels the inter-word positions as boundary or non-boundary by Viterbi decoding. Experiments show that, as compared with the state-of-the-art DTCRF approach [1], the proposed DNN-CRF approach achieves 16.7% and 4.1% reduction in NIST boundary detection error in reference and speech recognition transcripts, respectively.

AB - This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field) architecture. The DNN accepts prosodic feature inputs and non-linearly maps them into boundary/non-boundary posterior probability outputs. Subsequently, the posterior probabilities are combined with lexical features and the integrated features are modeled by a linear-chain CRF. The CRF finally labels the inter-word positions as boundary or non-boundary by Viterbi decoding. Experiments show that, as compared with the state-of-the-art DTCRF approach [1], the proposed DNN-CRF approach achieves 16.7% and 4.1% reduction in NIST boundary detection error in reference and speech recognition transcripts, respectively.

KW - Deep neural network

KW - Rich transcription

KW - Sentence boundary detection

KW - Structural event detection

UR - http://www.scopus.com/inward/record.url?scp=84910051184&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:84910051184

SN - 2308-457X

SP - 2887

EP - 2891

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014

Y2 - 14 September 2014 through 18 September 2014

ER -

A deep neural network approach for sentence boundary detection in broadcast news

摘要

其它文件与链接

指纹

引用此