TY - GEN
T1 - A front-end speech enhancement system for robust automotive speech recognition
AU - Wang, Haikun
AU - Ye, Zhongfu
AU - Chen, Jingdong
N1 - Publisher Copyright:
� 2018 IEEE
PY - 2018/7/2
Y1 - 2018/7/2
N2 - This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines model-based voice activity detection (VAD), relative transfer function (RTF) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data are then used to train Gaussian mixture models (GMMs) for both speech and noise. The trained GMMs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then served as the basic information for RTF estimation, adaptive beamforming, and post-filtering.Experiments are conducted in real automotive environments and the results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).
AB - This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines model-based voice activity detection (VAD), relative transfer function (RTF) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data are then used to train Gaussian mixture models (GMMs) for both speech and noise. The trained GMMs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then served as the basic information for RTF estimation, adaptive beamforming, and post-filtering.Experiments are conducted in real automotive environments and the results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).
KW - Generalized sidelobe cancellation
KW - Microphone array
KW - Model-based
KW - Relative transfer function estimation
KW - Speech enhancement
KW - Speech recognition
KW - Voice activity detection
UR - http://www.scopus.com/inward/record.url?scp=85065866793&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP.2018.8706649
DO - 10.1109/ISCSLP.2018.8706649
M3 - 会议稿件
AN - SCOPUS:85065866793
T3 - 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
SP - 1
EP - 5
BT - 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018
Y2 - 26 November 2018 through 29 November 2018
ER -