Investigating generative adversarial networks based speech dereverberation for robust speech recognition

Ke Wang; Junbo Zhang; Sining Sun; Yujun Wang; Fei Xiang; Lei Xie

doi:10.21437/Interspeech.2018-1780

Investigating generative adversarial networks based speech dereverberation for robust speech recognition

Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

17 Scopus citations

Abstract

We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads to a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%∼19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model.

Original language	English
Pages (from-to)	1581-1585
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2018-September
DOIs	https://doi.org/10.21437/Interspeech.2018-1780
State	Published - 2018
Event	19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India Duration: 2 Sep 2018 → 6 Sep 2018

Keywords

Generative adversarial nets
Residual networks
Robust speech recognition
Speech dereverberation

Access to Document

10.21437/Interspeech.2018-1780

Cite this

@article{44183a7bea004664addb07e9d839dceb,

title = "Investigating generative adversarial networks based speech dereverberation for robust speech recognition",

abstract = "We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads to a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%∼19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model.",

keywords = "Generative adversarial nets, Residual networks, Robust speech recognition, Speech dereverberation",

author = "Ke Wang and Junbo Zhang and Sining Sun and Yujun Wang and Fei Xiang and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2018 International Speech Communication Association. All rights reserved.; 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 ; Conference date: 02-09-2018 Through 06-09-2018",

year = "2018",

doi = "10.21437/Interspeech.2018-1780",

language = "英语",

volume = "2018-September",

pages = "1581--1585",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

Investigating generative adversarial networks based speech dereverberation for robust speech recognition. / Wang, Ke; Zhang, Junbo; Sun, Sining et al.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2018-September, 2018, p. 1581-1585.

Research output: Contribution to journal › Conference article › peer-review

TY - JOUR

T1 - Investigating generative adversarial networks based speech dereverberation for robust speech recognition

AU - Wang, Ke

AU - Zhang, Junbo

AU - Sun, Sining

AU - Wang, Yujun

AU - Xiang, Fei

AU - Xie, Lei

PY - 2018

Y1 - 2018

N2 - We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads to a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%∼19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model.

AB - We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads to a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%∼19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model.

KW - Generative adversarial nets

KW - Residual networks

KW - Robust speech recognition

KW - Speech dereverberation

UR - http://www.scopus.com/inward/record.url?scp=85054959703&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-1780

DO - 10.21437/Interspeech.2018-1780

M3 - 会议文章

AN - SCOPUS:85054959703

SN - 2308-457X

VL - 2018-September

SP - 1581

EP - 1585

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018

Y2 - 2 September 2018 through 6 September 2018

ER -

Investigating generative adversarial networks based speech dereverberation for robust speech recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this