An unsupervised deep domain adaptation approach for robust speech recognition

Sining Sun; Binbin Zhang; Lei Xie; Yanning Zhang

doi:10.1016/j.neucom.2016.11.063

An unsupervised deep domain adaptation approach for robust speech recognition

Sining Sun, Binbin Zhang, Lei Xie, Yanning Zhang

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

147 Scopus citations

Abstract

This paper addresses the robust speech recognition problem as a domain adaptation task. Specifically, we introduce an unsupervised deep domain adaptation (DDA) approach to acoustic modeling in order to eliminate the training–testing mismatch that is common in real-world use of speech recognition. Under a multi-task learning framework, the approach jointly learns two discriminative classifiers using one deep neural network (DNN). As the main task, a label predictor predicts phoneme labels and is used during training and at test time. As the second task, a domain classifier discriminates between the source and the target domains during training. The network is optimized by minimizing the loss of the label classifier and to maximize the loss of the domain classifier at the same time. The proposed approach is easy to implement by modifying a common feed-forward network. Moreover, this unsupervised approach only needs labeled training data from the source domain and some unlabeled raw data of the new domain. Speech recognition experiments on noise/channel distortion and domain shift confirm the effectiveness of the proposed approach. For instance, on the Aurora-4 corpus, compared with the acoustic model trained only using clean data, the DDA approach achieves relative 37.8% word error rate (WER) reduction.

Original language	English
Pages (from-to)	79-87
Number of pages	9
Journal	Neurocomputing
Volume	257
DOIs	https://doi.org/10.1016/j.neucom.2016.11.063
State	Published - 27 Sep 2017

Keywords

Deep learning
Deep neural network
Domain adaptation
Robust speech recognition

Access to Document

10.1016/j.neucom.2016.11.063

Cite this

@article{105b7b9446f049e3af18d6609bbf08c2,

title = "An unsupervised deep domain adaptation approach for robust speech recognition",

abstract = "This paper addresses the robust speech recognition problem as a domain adaptation task. Specifically, we introduce an unsupervised deep domain adaptation (DDA) approach to acoustic modeling in order to eliminate the training–testing mismatch that is common in real-world use of speech recognition. Under a multi-task learning framework, the approach jointly learns two discriminative classifiers using one deep neural network (DNN). As the main task, a label predictor predicts phoneme labels and is used during training and at test time. As the second task, a domain classifier discriminates between the source and the target domains during training. The network is optimized by minimizing the loss of the label classifier and to maximize the loss of the domain classifier at the same time. The proposed approach is easy to implement by modifying a common feed-forward network. Moreover, this unsupervised approach only needs labeled training data from the source domain and some unlabeled raw data of the new domain. Speech recognition experiments on noise/channel distortion and domain shift confirm the effectiveness of the proposed approach. For instance, on the Aurora-4 corpus, compared with the acoustic model trained only using clean data, the DDA approach achieves relative 37.8% word error rate (WER) reduction.",

keywords = "Deep learning, Deep neural network, Domain adaptation, Robust speech recognition",

author = "Sining Sun and Binbin Zhang and Lei Xie and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2017 Elsevier B.V.",

year = "2017",

month = sep,

day = "27",

doi = "10.1016/j.neucom.2016.11.063",

language = "英语",

volume = "257",

pages = "79--87",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - An unsupervised deep domain adaptation approach for robust speech recognition

AU - Sun, Sining

AU - Zhang, Binbin

AU - Xie, Lei

AU - Zhang, Yanning

PY - 2017/9/27

Y1 - 2017/9/27

N2 - This paper addresses the robust speech recognition problem as a domain adaptation task. Specifically, we introduce an unsupervised deep domain adaptation (DDA) approach to acoustic modeling in order to eliminate the training–testing mismatch that is common in real-world use of speech recognition. Under a multi-task learning framework, the approach jointly learns two discriminative classifiers using one deep neural network (DNN). As the main task, a label predictor predicts phoneme labels and is used during training and at test time. As the second task, a domain classifier discriminates between the source and the target domains during training. The network is optimized by minimizing the loss of the label classifier and to maximize the loss of the domain classifier at the same time. The proposed approach is easy to implement by modifying a common feed-forward network. Moreover, this unsupervised approach only needs labeled training data from the source domain and some unlabeled raw data of the new domain. Speech recognition experiments on noise/channel distortion and domain shift confirm the effectiveness of the proposed approach. For instance, on the Aurora-4 corpus, compared with the acoustic model trained only using clean data, the DDA approach achieves relative 37.8% word error rate (WER) reduction.

AB - This paper addresses the robust speech recognition problem as a domain adaptation task. Specifically, we introduce an unsupervised deep domain adaptation (DDA) approach to acoustic modeling in order to eliminate the training–testing mismatch that is common in real-world use of speech recognition. Under a multi-task learning framework, the approach jointly learns two discriminative classifiers using one deep neural network (DNN). As the main task, a label predictor predicts phoneme labels and is used during training and at test time. As the second task, a domain classifier discriminates between the source and the target domains during training. The network is optimized by minimizing the loss of the label classifier and to maximize the loss of the domain classifier at the same time. The proposed approach is easy to implement by modifying a common feed-forward network. Moreover, this unsupervised approach only needs labeled training data from the source domain and some unlabeled raw data of the new domain. Speech recognition experiments on noise/channel distortion and domain shift confirm the effectiveness of the proposed approach. For instance, on the Aurora-4 corpus, compared with the acoustic model trained only using clean data, the DDA approach achieves relative 37.8% word error rate (WER) reduction.

KW - Deep learning

KW - Deep neural network

KW - Domain adaptation

KW - Robust speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85012870855&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2016.11.063

DO - 10.1016/j.neucom.2016.11.063

M3 - 文章

AN - SCOPUS:85012870855

SN - 0925-2312

VL - 257

SP - 79

EP - 87

JO - Neurocomputing

JF - Neurocomputing

ER -

An unsupervised deep domain adaptation approach for robust speech recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this