OBJECTIVE DISTANCE MEASURES FOR ASSESSING CONCATENATIVE SPEECH SYNTHESIS

Jing Dong Chen, Nick Campbell

Research output: Contribution to conferencePaperpeer-review

14 Scopus citations

Abstract

Several different acoustic transforms of the speech signal are compared for use in the assessment and evaluation of concatenative speech synthesis. The transforms tested include LPC, LSP, MFCC, bispectrum, Mellin transform of the log spectrum, Wigner-Ville distribution (WVD), etc. The computed distances between a synthesised utterance and a naturally spoken version of the same sentence are compared by correlation with perceptually-based scores obtained from a MOS evaluation. The results show that the distances computed using the bispectrum have the highest degree of correlation with the MOS score. Both the RMFCC and the LPC outperform the MFCC and the LPCC. The WVD-based cepstrum is found to behave poorly in this task.

Original languageEnglish
Pages611-614
Number of pages4
DOIs
StatePublished - 1999
Externally publishedYes
Event6th European Conference on Speech Communication and Technology, EUROSPEECH 1999 - Budapest, Hungary
Duration: 5 Sep 19999 Sep 1999

Conference

Conference6th European Conference on Speech Communication and Technology, EUROSPEECH 1999
Country/TerritoryHungary
CityBudapest
Period5/09/999/09/99

Fingerprint

Dive into the research topics of 'OBJECTIVE DISTANCE MEASURES FOR ASSESSING CONCATENATIVE SPEECH SYNTHESIS'. Together they form a unique fingerprint.

Cite this