A front-end speech enhancement system for robust automotive speech recognition

Haikun Wang; Zhongfu Ye; Jingdong Chen

doi:10.1109/ISCSLP.2018.8706649

A front-end speech enhancement system for robust automotive speech recognition

Haikun Wang, Zhongfu Ye, Jingdong Chen

航海学院

University of Science and Technology of China

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

2 引用（Scopus）

摘要

This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines model-based voice activity detection (VAD), relative transfer function (RTF) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data are then used to train Gaussian mixture models (GMMs) for both speech and noise. The trained GMMs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then served as the basic information for RTF estimation, adaptive beamforming, and post-filtering.Experiments are conducted in real automotive environments and the results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).

源语言	英语
主期刊名	2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	1-5
页数	5
ISBN（电子版）	9781538656273
DOI	https://doi.org/10.1109/ISCSLP.2018.8706649
出版状态	已出版 - 2 7月 2018
活动	11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Taipei, 中国台湾期限: 26 11月 2018 → 29 11月 2018

出版系列

姓名	2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings

会议

会议	11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018
国家/地区	中国台湾
市	Taipei
时期	26/11/18 → 29/11/18

访问文件

10.1109/ISCSLP.2018.8706649

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, H., Ye, Z., & Chen, J. (2018). A front-end speech enhancement system for robust automotive speech recognition. 在 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings (页码 1-5). 文章 8706649 (2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISCSLP.2018.8706649

Wang, Haikun ; Ye, Zhongfu ; Chen, Jingdong. / A front-end speech enhancement system for robust automotive speech recognition. 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. 页码 1-5 (2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings).

@inproceedings{e133f10d115e4c738c12f11e3aff843e,

title = "A front-end speech enhancement system for robust automotive speech recognition",

abstract = "This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines model-based voice activity detection (VAD), relative transfer function (RTF) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data are then used to train Gaussian mixture models (GMMs) for both speech and noise. The trained GMMs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then served as the basic information for RTF estimation, adaptive beamforming, and post-filtering.Experiments are conducted in real automotive environments and the results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).",

keywords = "Generalized sidelobe cancellation, Microphone array, Model-based, Relative transfer function estimation, Speech enhancement, Speech recognition, Voice activity detection",

author = "Haikun Wang and Zhongfu Ye and Jingdong Chen",

note = "Publisher Copyright: � 2018 IEEE; 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 ; Conference date: 26-11-2018 Through 29-11-2018",

year = "2018",

month = jul,

day = "2",

doi = "10.1109/ISCSLP.2018.8706649",

language = "英语",

series = "2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1--5",

booktitle = "2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings",

}

Wang, H, Ye, Z & Chen, J 2018, A front-end speech enhancement system for robust automotive speech recognition. 在 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings., 8706649, 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 页码 1-5, 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018, Taipei, 中国台湾, 26/11/18. https://doi.org/10.1109/ISCSLP.2018.8706649

A front-end speech enhancement system for robust automotive speech recognition. / Wang, Haikun; Ye, Zhongfu; Chen, Jingdong.
2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. 页码 1-5 8706649 (2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - A front-end speech enhancement system for robust automotive speech recognition

AU - Wang, Haikun

AU - Ye, Zhongfu

AU - Chen, Jingdong

N1 - Publisher Copyright: � 2018 IEEE

PY - 2018/7/2

Y1 - 2018/7/2

N2 - This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines model-based voice activity detection (VAD), relative transfer function (RTF) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data are then used to train Gaussian mixture models (GMMs) for both speech and noise. The trained GMMs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then served as the basic information for RTF estimation, adaptive beamforming, and post-filtering.Experiments are conducted in real automotive environments and the results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).

AB - This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines model-based voice activity detection (VAD), relative transfer function (RTF) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data are then used to train Gaussian mixture models (GMMs) for both speech and noise. The trained GMMs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then served as the basic information for RTF estimation, adaptive beamforming, and post-filtering.Experiments are conducted in real automotive environments and the results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).

KW - Generalized sidelobe cancellation

KW - Microphone array

KW - Model-based

KW - Relative transfer function estimation

KW - Speech enhancement

KW - Speech recognition

KW - Voice activity detection

UR - http://www.scopus.com/inward/record.url?scp=85065866793&partnerID=8YFLogxK

U2 - 10.1109/ISCSLP.2018.8706649

DO - 10.1109/ISCSLP.2018.8706649

M3 - 会议稿件

AN - SCOPUS:85065866793

T3 - 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings

SP - 1

EP - 5

BT - 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018

Y2 - 26 November 2018 through 29 November 2018

ER -

Wang H, Ye Z, Chen J. A front-end speech enhancement system for robust automotive speech recognition. 在 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2018. 页码 1-5. 8706649. (2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings). doi: 10.1109/ISCSLP.2018.8706649

A front-end speech enhancement system for robust automotive speech recognition

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此