A block cosine transform and its application in speech recognition

Jingdong Chen, Kuldip K. Paliwal, Satoshi Nakamura

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)

摘要

Noise robust speech recognition has become an important area of research in recent years. The fact that human listeners can recognize speech in the presence of strong noise inspires researchers to imitate some aspects of human auditory perception in automatic speech recognition. This has led to sub-band based speech recognition in which the full-band speech is split into several sub-bands and where each sub-band is processed separately. The resulting multi-band features can be combined in various ways for carrying out speech recognition task. Reported results have shown the superiority of this technique for speech recognition in strong noise conditions. In this paper, we will briefly review the multi-band feature extraction. We will then propose a block discrete cosine transform (BDCT) with its kernel transformation matrix being derived from the decomposition of the kernel of the discrete cosine transform (DCT). We show that the BDCT approximates the DCT in keeping information in decorrelating a sequence. When the BDCT is applied to the mel frequency filter bank energies (FBEs) to replace the DCT to convert them to cepstral coefficients, a new kind of MFCCs is yielded. We call these new features Block discrete cosine transform based MFCCs (BMFCCs) and show that a sub-band processing idea is implicit in the BMFCCs since the BDCT automatically divides the mel frequency FBEs into two sub-bands. We will report various speech recognition results using the BMFCCs as well as the comparison with the multi-band MFCCs and fullband MFCCs to elaborate the properties of the BMFCCs.

源语言英语
主期刊名6th International Conference on Spoken Language Processing, ICSLP 2000
出版商International Speech Communication Association
ISBN(电子版)7801501144, 9787801501141
出版状态已出版 - 2000
已对外发布
活动6th International Conference on Spoken Language Processing, ICSLP 2000 - Beijing, 中国
期限: 16 10月 200020 10月 2000

出版系列

姓名6th International Conference on Spoken Language Processing, ICSLP 2000

会议

会议6th International Conference on Spoken Language Processing, ICSLP 2000
国家/地区中国
Beijing
时期16/10/0020/10/00

指纹

探究 'A block cosine transform and its application in speech recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此