A block cosine transform and its application in speech recognition

Jingdong Chen, Kuldip K. Paliwal, Satoshi Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Noise robust speech recognition has become an important area of research in recent years. The fact that human listeners can recognize speech in the presence of strong noise inspires researchers to imitate some aspects of human auditory perception in automatic speech recognition. This has led to sub-band based speech recognition in which the full-band speech is split into several sub-bands and where each sub-band is processed separately. The resulting multi-band features can be combined in various ways for carrying out speech recognition task. Reported results have shown the superiority of this technique for speech recognition in strong noise conditions. In this paper, we will briefly review the multi-band feature extraction. We will then propose a block discrete cosine transform (BDCT) with its kernel transformation matrix being derived from the decomposition of the kernel of the discrete cosine transform (DCT). We show that the BDCT approximates the DCT in keeping information in decorrelating a sequence. When the BDCT is applied to the mel frequency filter bank energies (FBEs) to replace the DCT to convert them to cepstral coefficients, a new kind of MFCCs is yielded. We call these new features Block discrete cosine transform based MFCCs (BMFCCs) and show that a sub-band processing idea is implicit in the BMFCCs since the BDCT automatically divides the mel frequency FBEs into two sub-bands. We will report various speech recognition results using the BMFCCs as well as the comparison with the multi-band MFCCs and fullband MFCCs to elaborate the properties of the BMFCCs.

Original languageEnglish
Title of host publication6th International Conference on Spoken Language Processing, ICSLP 2000
PublisherInternational Speech Communication Association
ISBN (Electronic)7801501144, 9787801501141
StatePublished - 2000
Externally publishedYes
Event6th International Conference on Spoken Language Processing, ICSLP 2000 - Beijing, China
Duration: 16 Oct 200020 Oct 2000

Publication series

Name6th International Conference on Spoken Language Processing, ICSLP 2000

Conference

Conference6th International Conference on Spoken Language Processing, ICSLP 2000
Country/TerritoryChina
CityBeijing
Period16/10/0020/10/00

Fingerprint

Dive into the research topics of 'A block cosine transform and its application in speech recognition'. Together they form a unique fingerprint.

Cite this