A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method

Haikun Wang; Zhongfu Ye; Jingdong Chen

doi:10.1109/IWAENC.2018.8521410

A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method

Haikun Wang, Zhongfu Ye, Jingdong Chen

航海学院

University of Science and Technology of China

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

8 引用（Scopus）

摘要

This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines hybrid voice activity detection (VAD), relative transfer function (RT-F) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data is then used to train deep neural network models (DNNs) for both speech and noise. The trained DNNs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then combined with the output of an energy-based VAD to form a hybrid VAD, which serves as the basis for the rest components of the speech enhancement system, including RTF estimation, adaptive beamforming, and post-filtering. Experiments are conducted in real automotive environments. The results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).

源语言	英语
主期刊名	16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	456-460
页数	5
ISBN（电子版）	9781538681510
DOI	https://doi.org/10.1109/IWAENC.2018.8521410
出版状态	已出版 - 2 11月 2018
活动	16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Tokyo, 日本期限: 17 9月 2018 → 20 9月 2018

出版系列

姓名	16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings

会议

会议	16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018
国家/地区	日本
市	Tokyo
时期	17/09/18 → 20/09/18

访问文件

10.1109/IWAENC.2018.8521410

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, H., Ye, Z., & Chen, J. (2018). A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method. 在 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings (页码 456-460). 文章 8521410 (16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IWAENC.2018.8521410

Wang, Haikun ; Ye, Zhongfu ; Chen, Jingdong. / A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method. 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. 页码 456-460 (16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings).

@inproceedings{f29c11d4450a41e691808a572cb25fb9,

title = "A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method",

abstract = "This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines hybrid voice activity detection (VAD), relative transfer function (RT-F) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data is then used to train deep neural network models (DNNs) for both speech and noise. The trained DNNs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then combined with the output of an energy-based VAD to form a hybrid VAD, which serves as the basis for the rest components of the speech enhancement system, including RTF estimation, adaptive beamforming, and post-filtering. Experiments are conducted in real automotive environments. The results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).",

keywords = "Deep neural network, Microphone array, Speech enhancement, Speech recognition, Voice activity detection",

author = "Haikun Wang and Zhongfu Ye and Jingdong Chen",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 ; Conference date: 17-09-2018 Through 20-09-2018",

year = "2018",

month = nov,

day = "2",

doi = "10.1109/IWAENC.2018.8521410",

language = "英语",

series = "16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "456--460",

booktitle = "16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings",

}

Wang, H, Ye, Z & Chen, J 2018, A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method. 在 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings., 8521410, 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 页码 456-460, 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018, Tokyo, 日本, 17/09/18. https://doi.org/10.1109/IWAENC.2018.8521410

A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method. / Wang, Haikun; Ye, Zhongfu; Chen, Jingdong.
16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. 页码 456-460 8521410 (16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method

AU - Wang, Haikun

AU - Ye, Zhongfu

AU - Chen, Jingdong

PY - 2018/11/2

Y1 - 2018/11/2

N2 - This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines hybrid voice activity detection (VAD), relative transfer function (RT-F) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data is then used to train deep neural network models (DNNs) for both speech and noise. The trained DNNs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then combined with the output of an energy-based VAD to form a hybrid VAD, which serves as the basis for the rest components of the speech enhancement system, including RTF estimation, adaptive beamforming, and post-filtering. Experiments are conducted in real automotive environments. The results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).

AB - This paper presents a front-end speech enhancement approach to robust speech recognition in automotive environments. It combines hybrid voice activity detection (VAD), relative transfer function (RT-F) based generalized sidelobe cancelation, and single-channel post filtering to enhance the speech signal of interest, thereby improving the robustness of speech recognition. First, we choose four typical driving scenarios, which include most of the noise types in automobiles to record training data. The recorded data is then used to train deep neural network models (DNNs) for both speech and noise. The trained DNNs are subsequently used to estimate the speech presence probability on a frame-by-frame basis. This speech presence probability is then combined with the output of an energy-based VAD to form a hybrid VAD, which serves as the basis for the rest components of the speech enhancement system, including RTF estimation, adaptive beamforming, and post-filtering. Experiments are conducted in real automotive environments. The results show that the developed method can significantly improve the performance of both VAD and automatic speech recognition (ASR).

KW - Deep neural network

KW - Microphone array

KW - Speech enhancement

KW - Speech recognition

KW - Voice activity detection

UR - http://www.scopus.com/inward/record.url?scp=85057420738&partnerID=8YFLogxK

U2 - 10.1109/IWAENC.2018.8521410

DO - 10.1109/IWAENC.2018.8521410

M3 - 会议稿件

AN - SCOPUS:85057420738

T3 - 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings

SP - 456

EP - 460

BT - 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018

Y2 - 17 September 2018 through 20 September 2018

ER -

Wang H, Ye Z, Chen J. A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method. 在 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2018. 页码 456-460. 8521410. (16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings). doi: 10.1109/IWAENC.2018.8521410

A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此