Prosody boundary detection through context-dependent position models

Yue Ning Hu; Min Chu; Chao Huang; Yan Ning Zhang

Prosody boundary detection through context-dependent position models

Yue Ning Hu, Min Chu, Chao Huang, Yan Ning Zhang

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

1 引用（Scopus）

摘要

In this paper, we propose to convert the prosody boundary detection task into a syllable position labeling task. In order to detect both prosodic word and prosodic phrase boundaries, 6 types of syllable positions are defined. For each position, context-dependent position models are trained from manually labeled data. These models are used to label syllable positions in unseen speech. Word and phrase boundaries are then easily derived from syllable position labels. The proposed approach is tested with a large scale single speaker database. The precision and recall for word boundary are 96.1% and 90.1%, respectively, and for phrase boundary are 83.7% and 80.5%, respectively. Results of a listening test shows that only 28% of word boundaries and 50% of phrase of boundaries detected automatically are critical error, implying only about 2.2% and 10% errors for word and phrase boundaries, respectively. The results are rather good, especially when it is considered that only acoustic features are used in this work.

源语言	英语
页（从-至）	2142-2145
页数	4
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
出版状态	已出版 - 2008
活动	INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, 澳大利亚期限: 22 9月 2008 → 26 9月 2008

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d62ab80848ff40bdab67451ec563ece9,

title = "Prosody boundary detection through context-dependent position models",

abstract = "In this paper, we propose to convert the prosody boundary detection task into a syllable position labeling task. In order to detect both prosodic word and prosodic phrase boundaries, 6 types of syllable positions are defined. For each position, context-dependent position models are trained from manually labeled data. These models are used to label syllable positions in unseen speech. Word and phrase boundaries are then easily derived from syllable position labels. The proposed approach is tested with a large scale single speaker database. The precision and recall for word boundary are 96.1% and 90.1%, respectively, and for phrase boundary are 83.7% and 80.5%, respectively. Results of a listening test shows that only 28% of word boundaries and 50% of phrase of boundaries detected automatically are critical error, implying only about 2.2% and 10% errors for word and phrase boundaries, respectively. The results are rather good, especially when it is considered that only acoustic features are used in this work.",

keywords = "Boundary detection, Context-dependent position model, Phrase, Prosodic word",

author = "Hu, {Yue Ning} and Min Chu and Chao Huang and Zhang, {Yan Ning}",

year = "2008",

language = "英语",

pages = "2142--2145",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

note = "INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association ; Conference date: 22-09-2008 Through 26-09-2008",

}

TY - JOUR

T1 - Prosody boundary detection through context-dependent position models

AU - Hu, Yue Ning

AU - Chu, Min

AU - Huang, Chao

AU - Zhang, Yan Ning

PY - 2008

Y1 - 2008

N2 - In this paper, we propose to convert the prosody boundary detection task into a syllable position labeling task. In order to detect both prosodic word and prosodic phrase boundaries, 6 types of syllable positions are defined. For each position, context-dependent position models are trained from manually labeled data. These models are used to label syllable positions in unseen speech. Word and phrase boundaries are then easily derived from syllable position labels. The proposed approach is tested with a large scale single speaker database. The precision and recall for word boundary are 96.1% and 90.1%, respectively, and for phrase boundary are 83.7% and 80.5%, respectively. Results of a listening test shows that only 28% of word boundaries and 50% of phrase of boundaries detected automatically are critical error, implying only about 2.2% and 10% errors for word and phrase boundaries, respectively. The results are rather good, especially when it is considered that only acoustic features are used in this work.

AB - In this paper, we propose to convert the prosody boundary detection task into a syllable position labeling task. In order to detect both prosodic word and prosodic phrase boundaries, 6 types of syllable positions are defined. For each position, context-dependent position models are trained from manually labeled data. These models are used to label syllable positions in unseen speech. Word and phrase boundaries are then easily derived from syllable position labels. The proposed approach is tested with a large scale single speaker database. The precision and recall for word boundary are 96.1% and 90.1%, respectively, and for phrase boundary are 83.7% and 80.5%, respectively. Results of a listening test shows that only 28% of word boundaries and 50% of phrase of boundaries detected automatically are critical error, implying only about 2.2% and 10% errors for word and phrase boundaries, respectively. The results are rather good, especially when it is considered that only acoustic features are used in this work.

KW - Boundary detection

KW - Context-dependent position model

KW - Phrase

KW - Prosodic word

UR - http://www.scopus.com/inward/record.url?scp=84867209316&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:84867209316

SN - 2308-457X

SP - 2142

EP - 2145

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association

Y2 - 22 September 2008 through 26 September 2008

ER -