Cepstrum derived from differentiated power spectrum for robust speech recognition

Jingdong Chen; Kuldip K. Paliwal; Satoshi Nakamura

doi:10.1016/S0167-6393(03)00016-5

Cepstrum derived from differentiated power spectrum for robust speech recognition

Jingdong Chen, Kuldip K. Paliwal, Satoshi Nakamura

Research output: Contribution to journal › Article › peer-review

44 Scopus citations

Abstract

In this paper, cepstral features derived from the differential power spectrum (DPS) are proposed for improving the robustness of a speech recognizer in presence of background noise. These robust features are computed from the speech signal of a given frame through the following four steps. First, the short-time power spectrum of speech signal is computed from the speech signal through the fast Fourier transform algorithm. Second, DPS is obtained by differentiating the power spectrum with respect to frequency. Third, the magnitude of DPS is projected from linear frequency to the mel scale and smoothed by a filter bank. Finally, the outputs of the filter bank are transformed to cepstral coefficients by the discrete cosine transform after a nonlinear transformation. It is shown that this new feature set can be decomposed as the superposition of the standard cepstrum and its nonlinearly liftered counterpart. While a linear lifter has no effect on the continuous density hidden Markov model based speech recognition, we show that the proposed feature set embedded with a nonlinear liftering transformation is quite effective for robust speech recognition. For this, we conduct a number of speech recognition experiments (including isolated word recognition, connected digits recognition, and large vocabulary continuous speech recognition) in various operating environments and compare the DPS features with the standard mel-frequency cepstral coefficient features used with cepstral mean normalization and spectral subtraction techniques.

Original language	English
Pages (from-to)	469-484
Number of pages	16
Journal	Speech Communication
Volume	41
Issue number	2-3
DOIs	https://doi.org/10.1016/S0167-6393(03)00016-5
State	Published - Oct 2003
Externally published	Yes

Keywords

Cepstral mean normalization
Differential power spectrum
Hidden Markov model
Linear liftering
Robust speech recognition
Spectral subtraction

Access to Document

10.1016/S0167-6393(03)00016-5

Cite this

@article{a8a0ec6150014869a39999ab542c48eb,

title = "Cepstrum derived from differentiated power spectrum for robust speech recognition",

abstract = "In this paper, cepstral features derived from the differential power spectrum (DPS) are proposed for improving the robustness of a speech recognizer in presence of background noise. These robust features are computed from the speech signal of a given frame through the following four steps. First, the short-time power spectrum of speech signal is computed from the speech signal through the fast Fourier transform algorithm. Second, DPS is obtained by differentiating the power spectrum with respect to frequency. Third, the magnitude of DPS is projected from linear frequency to the mel scale and smoothed by a filter bank. Finally, the outputs of the filter bank are transformed to cepstral coefficients by the discrete cosine transform after a nonlinear transformation. It is shown that this new feature set can be decomposed as the superposition of the standard cepstrum and its nonlinearly liftered counterpart. While a linear lifter has no effect on the continuous density hidden Markov model based speech recognition, we show that the proposed feature set embedded with a nonlinear liftering transformation is quite effective for robust speech recognition. For this, we conduct a number of speech recognition experiments (including isolated word recognition, connected digits recognition, and large vocabulary continuous speech recognition) in various operating environments and compare the DPS features with the standard mel-frequency cepstral coefficient features used with cepstral mean normalization and spectral subtraction techniques.",

keywords = "Cepstral mean normalization, Differential power spectrum, Hidden Markov model, Linear liftering, Robust speech recognition, Spectral subtraction",

author = "Jingdong Chen and Paliwal, {Kuldip K.} and Satoshi Nakamura",

year = "2003",

month = oct,

doi = "10.1016/S0167-6393(03)00016-5",

language = "英语",

volume = "41",

pages = "469--484",

journal = "Speech Communication",

issn = "0167-6393",

publisher = "Elsevier B.V.",

number = "2-3",

}

TY - JOUR

T1 - Cepstrum derived from differentiated power spectrum for robust speech recognition

AU - Chen, Jingdong

AU - Paliwal, Kuldip K.

AU - Nakamura, Satoshi

PY - 2003/10

Y1 - 2003/10

N2 - In this paper, cepstral features derived from the differential power spectrum (DPS) are proposed for improving the robustness of a speech recognizer in presence of background noise. These robust features are computed from the speech signal of a given frame through the following four steps. First, the short-time power spectrum of speech signal is computed from the speech signal through the fast Fourier transform algorithm. Second, DPS is obtained by differentiating the power spectrum with respect to frequency. Third, the magnitude of DPS is projected from linear frequency to the mel scale and smoothed by a filter bank. Finally, the outputs of the filter bank are transformed to cepstral coefficients by the discrete cosine transform after a nonlinear transformation. It is shown that this new feature set can be decomposed as the superposition of the standard cepstrum and its nonlinearly liftered counterpart. While a linear lifter has no effect on the continuous density hidden Markov model based speech recognition, we show that the proposed feature set embedded with a nonlinear liftering transformation is quite effective for robust speech recognition. For this, we conduct a number of speech recognition experiments (including isolated word recognition, connected digits recognition, and large vocabulary continuous speech recognition) in various operating environments and compare the DPS features with the standard mel-frequency cepstral coefficient features used with cepstral mean normalization and spectral subtraction techniques.

AB - In this paper, cepstral features derived from the differential power spectrum (DPS) are proposed for improving the robustness of a speech recognizer in presence of background noise. These robust features are computed from the speech signal of a given frame through the following four steps. First, the short-time power spectrum of speech signal is computed from the speech signal through the fast Fourier transform algorithm. Second, DPS is obtained by differentiating the power spectrum with respect to frequency. Third, the magnitude of DPS is projected from linear frequency to the mel scale and smoothed by a filter bank. Finally, the outputs of the filter bank are transformed to cepstral coefficients by the discrete cosine transform after a nonlinear transformation. It is shown that this new feature set can be decomposed as the superposition of the standard cepstrum and its nonlinearly liftered counterpart. While a linear lifter has no effect on the continuous density hidden Markov model based speech recognition, we show that the proposed feature set embedded with a nonlinear liftering transformation is quite effective for robust speech recognition. For this, we conduct a number of speech recognition experiments (including isolated word recognition, connected digits recognition, and large vocabulary continuous speech recognition) in various operating environments and compare the DPS features with the standard mel-frequency cepstral coefficient features used with cepstral mean normalization and spectral subtraction techniques.

KW - Cepstral mean normalization

KW - Differential power spectrum

KW - Hidden Markov model

KW - Linear liftering

KW - Robust speech recognition

KW - Spectral subtraction

UR - http://www.scopus.com/inward/record.url?scp=0038373389&partnerID=8YFLogxK

U2 - 10.1016/S0167-6393(03)00016-5

DO - 10.1016/S0167-6393(03)00016-5

M3 - 文章

AN - SCOPUS:0038373389

SN - 0167-6393

VL - 41

SP - 469

EP - 484

JO - Speech Communication

JF - Speech Communication

IS - 2-3

ER -

Cepstrum derived from differentiated power spectrum for robust speech recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this