TY - JOUR
T1 - Unsupervised adaptation with adversarial dropout regularization for robust speech recognition
AU - Guo, Pengcheng
AU - Sun, Sining
AU - Xie, Lei
N1 - Publisher Copyright:
Copyright © 2019 ISCA
PY - 2019
Y1 - 2019
N2 - Recent adversarial methods proposed for unsupervised domain adaptation of acoustic models try to fool a specific domain discriminator and learn both senone-discriminative and domain-invariant hidden feature representations. However, a drawback of these approaches is that the feature generator simply aligns different features into the same distribution without considering the class boundaries of the target domain data. Thus, ambiguous target domain features can be generated near the decision boundaries, decreasing speech recognition performance. In this study, we propose to use Adversarial Dropout Regularization (ADR) in acoustic modeling to overcome the foregoing issue. Specifically, we optimize the senone classifier to make its decision boundaries lie in the class boundaries of unlabeled target data. Then, the feature generator learns to create features far away from the decision boundaries, which are more discriminative. We apply the ADR approach on the CHiME-3 corpus and the proposed method yields up to 12.9% relative WER reductions compared with the baseline trained on source domain data only and further improvement over the widely used gradient reversal layer method.
AB - Recent adversarial methods proposed for unsupervised domain adaptation of acoustic models try to fool a specific domain discriminator and learn both senone-discriminative and domain-invariant hidden feature representations. However, a drawback of these approaches is that the feature generator simply aligns different features into the same distribution without considering the class boundaries of the target domain data. Thus, ambiguous target domain features can be generated near the decision boundaries, decreasing speech recognition performance. In this study, we propose to use Adversarial Dropout Regularization (ADR) in acoustic modeling to overcome the foregoing issue. Specifically, we optimize the senone classifier to make its decision boundaries lie in the class boundaries of unlabeled target data. Then, the feature generator learns to create features far away from the decision boundaries, which are more discriminative. We apply the ADR approach on the CHiME-3 corpus and the proposed method yields up to 12.9% relative WER reductions compared with the baseline trained on source domain data only and further improvement over the widely used gradient reversal layer method.
KW - Adversarial training
KW - Domain adaptation
KW - Robust speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85074733487&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2019-2544
DO - 10.21437/Interspeech.2019-2544
M3 - 会议文章
AN - SCOPUS:85074733487
SN - 2308-457X
VL - 2019-September
SP - 749
EP - 753
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Y2 - 15 September 2019 through 19 September 2019
ER -