Domain adversarial training for accented speech recognition

Sining Sun, Ching Feng Yeh, Mei Yuh Hwang, Mari Ostendorf, Lei Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

94 Scopus citations

Abstract

In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem. In order to reduce the mismatch between labeled source domain data ('standard' accent) and unlabeled target domain data (with heavy accents), we augment the learning objective for a Kaldi TDNN network with a domain adversarial training (DAT) objective to encourage the model to learn accent-invariant features. In experiments with three Mandarin accents, we show that DAT yields up to 7.45% relative character error rate reduction when we do not have transcriptions of the accented speech, compared with the baseline trained on standard accent data only. We also find a benefit from DAT when used in combination with training from automatic transcriptions on the accented data. Furthermore, we find that DAT is superior to multi-task learning for accented speech recognition.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4854-4858
Number of pages5
ISBN (Print)9781538646588
DOIs
StatePublished - 10 Sep 2018
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: 15 Apr 201820 Apr 2018

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2018-April
ISSN (Print)1520-6149

Conference

Conference2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Country/TerritoryCanada
CityCalgary
Period15/04/1820/04/18

Keywords

  • Accent robust speech recognition
  • Domain adaptation
  • Domain adversarial training

Fingerprint

Dive into the research topics of 'Domain adversarial training for accented speech recognition'. Together they form a unique fingerprint.

Cite this