Speaker normalization and novel robust speech feature based on Mellin transform

Jingdong Chen, Bo Xu, Taiyi Huang

科研成果: 期刊稿件文章同行评审

摘要

One major source of inter-speaker variability in speaker-independent (SI) speech recognition is the variation of the vocal tract shape, especially the vocal tract length (VTL) among individual speakers. If the model of the vocal tract is assumed to be a uniform tube with a length of L, then the formant frequencies of utterances of a given sound are inversely proportional to L. Since the VTL can vary from approximately 13 cm for females to over 18 cm for males, formant center frequencies can vary by as much as 25% among speakers. This source of variability results in state-of-the-art SI speech recognizers working poorly for outlier speakers whose vocal tract shapes differ significantly from those of speakers in the training set. In an effort to reduce the degradation in speech recognition performance caused by the variation of the VTL among speakers, two methods are investigated in this paper. One is to remove the variability with a technique of speaker normalization. Another is to extract a new feature based on the Mellin transform (MT). Because of the scale invariance property of the MT, the new feature is insensitive to the variation of VTL among different speakers. Experiments show that both methods can improve the performance of SI recognizers, while the latter approach is more effective than the former one.

源语言英语
页(从-至)478-484
页数7
期刊Zidonghua Xuebao/Acta Automatica Sinica
26
4
出版状态已出版 - 7月 2000
已对外发布

指纹

探究 'Speaker normalization and novel robust speech feature based on Mellin transform' 的科研主题。它们共同构成独一无二的指纹。

引用此