Mandarin speech pattern discovery using segmental dynamic time warping and posteriorgram features

Peng Yang; Lei Xie; Hongjie Chen

Mandarin speech pattern discovery using segmental dynamic time warping and posteriorgram features

Peng Yang, Lei Xie, Hongjie Chen

计算机学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Speech pattern discovery aims to identify repeated patterns (e.g., word-like units) from speech. This study analyzes speech patterns in a Mandarin speech corpus using segmental dynamic time warping (SDTW). Mel frequency cepstral coefficients (MFCCs) have not been effective for pattern discovery in multi-speaker conditions. The phoneme posteriorgram features are used here in a template-based method. Tests show that phoneme posteriorgram is significantly better than MFCCs for both single- and multi-speaker conditions. The performance upper-bound of SDTW is also investigated when boundary information is available with the segments divided by word boundaries. The results show that the boundaries significantly improve the pattern discovery in terms of both accuracy and efficiency.

源语言	英语
页（从-至）	903-907
页数	5
期刊	Qinghua Daxue Xuebao/Journal of Tsinghua University
卷	53
期	6
出版状态	已出版 - 2013

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{4097f57fa43b4c6182fe763d43f343ea,

title = "Mandarin speech pattern discovery using segmental dynamic time warping and posteriorgram features",

abstract = "Speech pattern discovery aims to identify repeated patterns (e.g., word-like units) from speech. This study analyzes speech patterns in a Mandarin speech corpus using segmental dynamic time warping (SDTW). Mel frequency cepstral coefficients (MFCCs) have not been effective for pattern discovery in multi-speaker conditions. The phoneme posteriorgram features are used here in a template-based method. Tests show that phoneme posteriorgram is significantly better than MFCCs for both single- and multi-speaker conditions. The performance upper-bound of SDTW is also investigated when boundary information is available with the segments divided by word boundaries. The results show that the boundaries significantly improve the pattern discovery in terms of both accuracy and efficiency.",

keywords = "Dynamic time warping (DTW), Posteriorgram, Segmental dynamic time warping (SDTW), Speech pattern discovery",

author = "Peng Yang and Lei Xie and Hongjie Chen",

year = "2013",

language = "英语",

volume = "53",

pages = "903--907",

journal = "Qinghua Daxue Xuebao/Journal of Tsinghua University",

issn = "1000-0054",

publisher = "Tsinghua University Press",

number = "6",

}

TY - JOUR

T1 - Mandarin speech pattern discovery using segmental dynamic time warping and posteriorgram features

AU - Yang, Peng

AU - Xie, Lei

AU - Chen, Hongjie

PY - 2013

Y1 - 2013

N2 - Speech pattern discovery aims to identify repeated patterns (e.g., word-like units) from speech. This study analyzes speech patterns in a Mandarin speech corpus using segmental dynamic time warping (SDTW). Mel frequency cepstral coefficients (MFCCs) have not been effective for pattern discovery in multi-speaker conditions. The phoneme posteriorgram features are used here in a template-based method. Tests show that phoneme posteriorgram is significantly better than MFCCs for both single- and multi-speaker conditions. The performance upper-bound of SDTW is also investigated when boundary information is available with the segments divided by word boundaries. The results show that the boundaries significantly improve the pattern discovery in terms of both accuracy and efficiency.

AB - Speech pattern discovery aims to identify repeated patterns (e.g., word-like units) from speech. This study analyzes speech patterns in a Mandarin speech corpus using segmental dynamic time warping (SDTW). Mel frequency cepstral coefficients (MFCCs) have not been effective for pattern discovery in multi-speaker conditions. The phoneme posteriorgram features are used here in a template-based method. Tests show that phoneme posteriorgram is significantly better than MFCCs for both single- and multi-speaker conditions. The performance upper-bound of SDTW is also investigated when boundary information is available with the segments divided by word boundaries. The results show that the boundaries significantly improve the pattern discovery in terms of both accuracy and efficiency.

KW - Dynamic time warping (DTW)

KW - Posteriorgram

KW - Segmental dynamic time warping (SDTW)

KW - Speech pattern discovery

UR - http://www.scopus.com/inward/record.url?scp=84886431783&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:84886431783

SN - 1000-0054

VL - 53

SP - 903

EP - 907

JO - Qinghua Daxue Xuebao/Journal of Tsinghua University

JF - Qinghua Daxue Xuebao/Journal of Tsinghua University

IS - 6

ER -

Mandarin speech pattern discovery using segmental dynamic time warping and posteriorgram features

摘要

其它文件与链接

指纹

引用此