Cepstrum derived from differentiated power spectrum for robust speech recognition

Jingdong Chen, Kuldip K. Paliwal, Satoshi Nakamura

科研成果: 期刊稿件文章同行评审

44 引用 (Scopus)

摘要

In this paper, cepstral features derived from the differential power spectrum (DPS) are proposed for improving the robustness of a speech recognizer in presence of background noise. These robust features are computed from the speech signal of a given frame through the following four steps. First, the short-time power spectrum of speech signal is computed from the speech signal through the fast Fourier transform algorithm. Second, DPS is obtained by differentiating the power spectrum with respect to frequency. Third, the magnitude of DPS is projected from linear frequency to the mel scale and smoothed by a filter bank. Finally, the outputs of the filter bank are transformed to cepstral coefficients by the discrete cosine transform after a nonlinear transformation. It is shown that this new feature set can be decomposed as the superposition of the standard cepstrum and its nonlinearly liftered counterpart. While a linear lifter has no effect on the continuous density hidden Markov model based speech recognition, we show that the proposed feature set embedded with a nonlinear liftering transformation is quite effective for robust speech recognition. For this, we conduct a number of speech recognition experiments (including isolated word recognition, connected digits recognition, and large vocabulary continuous speech recognition) in various operating environments and compare the DPS features with the standard mel-frequency cepstral coefficient features used with cepstral mean normalization and spectral subtraction techniques.

源语言英语
页(从-至)469-484
页数16
期刊Speech Communication
41
2-3
DOI
出版状态已出版 - 10月 2003
已对外发布

指纹

探究 'Cepstrum derived from differentiated power spectrum for robust speech recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此