Domain adversarial training for accented speech recognition

Sining Sun; Ching Feng Yeh; Mei Yuh Hwang; Mari Ostendorf; Lei Xie

doi:10.1109/ICASSP.2018.8462663

Domain adversarial training for accented speech recognition

Sining Sun, Ching Feng Yeh, Mei Yuh Hwang, Mari Ostendorf, Lei Xie

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

95 Scopus citations

Abstract

In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem. In order to reduce the mismatch between labeled source domain data ('standard' accent) and unlabeled target domain data (with heavy accents), we augment the learning objective for a Kaldi TDNN network with a domain adversarial training (DAT) objective to encourage the model to learn accent-invariant features. In experiments with three Mandarin accents, we show that DAT yields up to 7.45% relative character error rate reduction when we do not have transcriptions of the accented speech, compared with the baseline trained on standard accent data only. We also find a benefit from DAT when used in combination with training from automatic transcriptions on the accented data. Furthermore, we find that DAT is superior to multi-task learning for accented speech recognition.

Original language	English
Title of host publication	2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	4854-4858
Number of pages	5
ISBN (Print)	9781538646588
DOIs	https://doi.org/10.1109/ICASSP.2018.8462663
State	Published - 10 Sep 2018
Event	2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada Duration: 15 Apr 2018 → 20 Apr 2018

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2018-April
ISSN (Print)	1520-6149

Conference

Conference	2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Country/Territory	Canada
City	Calgary
Period	15/04/18 → 20/04/18

Keywords

Accent robust speech recognition
Domain adaptation
Domain adversarial training

Access to Document

10.1109/ICASSP.2018.8462663

Cite this

Sun, S., Yeh, C. F., Hwang, M. Y., Ostendorf, M., & Xie, L. (2018). Domain adversarial training for accented speech recognition. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (pp. 4854-4858). Article 8462663 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2018-April). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8462663

Sun, Sining ; Yeh, Ching Feng ; Hwang, Mei Yuh et al. / Domain adversarial training for accented speech recognition. 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 4854-4858 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{dde45f52c296401cac4e4fd25d20a060,

title = "Domain adversarial training for accented speech recognition",

abstract = "In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem. In order to reduce the mismatch between labeled source domain data ('standard' accent) and unlabeled target domain data (with heavy accents), we augment the learning objective for a Kaldi TDNN network with a domain adversarial training (DAT) objective to encourage the model to learn accent-invariant features. In experiments with three Mandarin accents, we show that DAT yields up to 7.45% relative character error rate reduction when we do not have transcriptions of the accented speech, compared with the baseline trained on standard accent data only. We also find a benefit from DAT when used in combination with training from automatic transcriptions on the accented data. Furthermore, we find that DAT is superior to multi-task learning for accented speech recognition.",

keywords = "Accent robust speech recognition, Domain adaptation, Domain adversarial training",

author = "Sining Sun and Yeh, {Ching Feng} and Hwang, {Mei Yuh} and Mari Ostendorf and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 ; Conference date: 15-04-2018 Through 20-04-2018",

year = "2018",

month = sep,

day = "10",

doi = "10.1109/ICASSP.2018.8462663",

language = "英语",

isbn = "9781538646588",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "4854--4858",

booktitle = "2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings",

}

Sun, S, Yeh, CF, Hwang, MY, Ostendorf, M & Xie, L 2018, Domain adversarial training for accented speech recognition. in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings., 8462663, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2018-April, Institute of Electrical and Electronics Engineers Inc., pp. 4854-4858, 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018, Calgary, Canada, 15/04/18. https://doi.org/10.1109/ICASSP.2018.8462663

Domain adversarial training for accented speech recognition. / Sun, Sining; Yeh, Ching Feng; Hwang, Mei Yuh et al.
2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. p. 4854-4858 8462663 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2018-April).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Domain adversarial training for accented speech recognition

AU - Sun, Sining

AU - Yeh, Ching Feng

AU - Hwang, Mei Yuh

AU - Ostendorf, Mari

AU - Xie, Lei

PY - 2018/9/10

Y1 - 2018/9/10

N2 - In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem. In order to reduce the mismatch between labeled source domain data ('standard' accent) and unlabeled target domain data (with heavy accents), we augment the learning objective for a Kaldi TDNN network with a domain adversarial training (DAT) objective to encourage the model to learn accent-invariant features. In experiments with three Mandarin accents, we show that DAT yields up to 7.45% relative character error rate reduction when we do not have transcriptions of the accented speech, compared with the baseline trained on standard accent data only. We also find a benefit from DAT when used in combination with training from automatic transcriptions on the accented data. Furthermore, we find that DAT is superior to multi-task learning for accented speech recognition.

AB - In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem. In order to reduce the mismatch between labeled source domain data ('standard' accent) and unlabeled target domain data (with heavy accents), we augment the learning objective for a Kaldi TDNN network with a domain adversarial training (DAT) objective to encourage the model to learn accent-invariant features. In experiments with three Mandarin accents, we show that DAT yields up to 7.45% relative character error rate reduction when we do not have transcriptions of the accented speech, compared with the baseline trained on standard accent data only. We also find a benefit from DAT when used in combination with training from automatic transcriptions on the accented data. Furthermore, we find that DAT is superior to multi-task learning for accented speech recognition.

KW - Accent robust speech recognition

KW - Domain adaptation

KW - Domain adversarial training

UR - http://www.scopus.com/inward/record.url?scp=85053740679&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2018.8462663

DO - 10.1109/ICASSP.2018.8462663

M3 - 会议稿件

AN - SCOPUS:85053740679

SN - 9781538646588

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 4854

EP - 4858

BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018

Y2 - 15 April 2018 through 20 April 2018

ER -

Sun S, Yeh CF, Hwang MY, Ostendorf M, Xie L. Domain adversarial training for accented speech recognition. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2018. p. 4854-4858. 8462663. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2018.8462663

Domain adversarial training for accented speech recognition

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this