On the impact of phoneme alignment in DNN-based speech synthesis

Mei Li, Zhizheng Wu, Lei Xie

科研成果: 会议稿件论文同行评审

4 引用 (Scopus)

摘要

Recently, deep neural networks (DNNs) have significantly improved the performance of acoustic modeling in statistical parametric speech synthesis (SPSS). However, in current implementations, when training a DNN-based speech synthesis system, phonetic transcripts are required to be aligned with the corresponding speech frames to obtain the phonetic segmentation, called phoneme alignment. Such an alignment is usually obtained by forced alignment based on hidden Markov models (HMMs) since manual alignment is labor-intensive and timeconsuming. In this work, we study the impact of phoneme alignment on the DNN-based speech synthesis system. Specifically, we compare the performances of different DNN-based speech synthesis systems, which use manual alignment and HMM-based forced alignment from three types of labels: HMM mono-phone, tri-phone and full-context. Objective and subjective evaluations are conducted in term of the naturalness of synthesized speech to compare the performances of different alignments.

源语言英语
196-201
页数6
出版状态已出版 - 2016
活动9th ISCA Speech Synthesis Workshop, SSW 2016 - Sunnyvale, 美国
期限: 13 9月 201615 9月 2016

会议

会议9th ISCA Speech Synthesis Workshop, SSW 2016
国家/地区美国
Sunnyvale
时期13/09/1615/09/16

指纹

探究 'On the impact of phoneme alignment in DNN-based speech synthesis' 的科研主题。它们共同构成独一无二的指纹。

引用此