Speaker normalization and novel robust speech feature based on Mellin transform

Jingdong Chen, Bo Xu, Taiyi Huang

Research output: Contribution to journalArticlepeer-review

Abstract

One major source of inter-speaker variability in speaker-independent (SI) speech recognition is the variation of the vocal tract shape, especially the vocal tract length (VTL) among individual speakers. If the model of the vocal tract is assumed to be a uniform tube with a length of L, then the formant frequencies of utterances of a given sound are inversely proportional to L. Since the VTL can vary from approximately 13 cm for females to over 18 cm for males, formant center frequencies can vary by as much as 25% among speakers. This source of variability results in state-of-the-art SI speech recognizers working poorly for outlier speakers whose vocal tract shapes differ significantly from those of speakers in the training set. In an effort to reduce the degradation in speech recognition performance caused by the variation of the VTL among speakers, two methods are investigated in this paper. One is to remove the variability with a technique of speaker normalization. Another is to extract a new feature based on the Mellin transform (MT). Because of the scale invariance property of the MT, the new feature is insensitive to the variation of VTL among different speakers. Experiments show that both methods can improve the performance of SI recognizers, while the latter approach is more effective than the former one.

Original languageEnglish
Pages (from-to)478-484
Number of pages7
JournalZidonghua Xuebao/Acta Automatica Sinica
Volume26
Issue number4
StatePublished - Jul 2000
Externally publishedYes

Fingerprint

Dive into the research topics of 'Speaker normalization and novel robust speech feature based on Mellin transform'. Together they form a unique fingerprint.

Cite this