Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech

Jingyong Hou, Pengcheng Guo, Sining Sun, Frank K. Soong, Wenping Hu, Lei Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

A second language (L2) learner usually cannot speak L2 well in both pronunciations and forming-of-words. Hence his/her L2 speech cannot be well recognized by a recognizer trained with native data. Domain adversarial training (DAT), capable of reducing the acoustic mismatch between training and testing, can be useful for improving speech recognition of L2 learners. To get around the ungrammatical L2 speech in scenario-based conversation training, keyword spotting (KWS) is an effective solution by relaxing the language model constraint in decoding. On the acoustic pronunciation side, DAT is investigated in this study for training a neural net-based acoustic model. DAT model is trained with both native and English as second language (ESL) learners' speech to extract more invariant features from native to ESL speech by equalizing their intrinsic difference. The model is jointly optimized for improved senone classification in training. Testing on ESL learners' speech and native English, the DAT model improves recognition performance which is comparable to jointly trained multi-condition model but significantly improves the performance of native speech recognition. In KWS, DAT shows a consistent better performance than the multi-condition training. The improved performance of proposed model is also obtained without increasing its computation complexity or the model size.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages8122-8126
Number of pages5
ISBN (Electronic)9781479981311
DOIs
StatePublished - May 2019
Event44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom
Duration: 12 May 201917 May 2019

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2019-May
ISSN (Print)1520-6149

Conference

Conference44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Country/TerritoryUnited Kingdom
CityBrighton
Period12/05/1917/05/19

Keywords

  • ASR
  • CALL
  • Domain adversarial training
  • ESL
  • Keyword spotting

Fingerprint

Dive into the research topics of 'Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech'. Together they form a unique fingerprint.

Cite this