Mandarin speech pattern discovery using segmental dynamic time warping and posteriorgram features

Peng Yang, Lei Xie, Hongjie Chen

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Speech pattern discovery aims to identify repeated patterns (e.g., word-like units) from speech. This study analyzes speech patterns in a Mandarin speech corpus using segmental dynamic time warping (SDTW). Mel frequency cepstral coefficients (MFCCs) have not been effective for pattern discovery in multi-speaker conditions. The phoneme posteriorgram features are used here in a template-based method. Tests show that phoneme posteriorgram is significantly better than MFCCs for both single- and multi-speaker conditions. The performance upper-bound of SDTW is also investigated when boundary information is available with the segments divided by word boundaries. The results show that the boundaries significantly improve the pattern discovery in terms of both accuracy and efficiency.

Original languageEnglish
Pages (from-to)903-907
Number of pages5
JournalQinghua Daxue Xuebao/Journal of Tsinghua University
Volume53
Issue number6
StatePublished - 2013

Keywords

  • Dynamic time warping (DTW)
  • Posteriorgram
  • Segmental dynamic time warping (SDTW)
  • Speech pattern discovery

Fingerprint

Dive into the research topics of 'Mandarin speech pattern discovery using segmental dynamic time warping and posteriorgram features'. Together they form a unique fingerprint.

Cite this