Adversarial regularization for attention based end-to-end robust speech recognition

Sining Sun; Pengcheng Guo; Lei Xie; Mei Yuh Hwang

doi:10.1109/TASLP.2019.2933146

Adversarial regularization for attention based end-to-end robust speech recognition

Sining Sun, Pengcheng Guo, Lei Xie, Mei Yuh Hwang

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

26 引用（Scopus）

摘要

End-to-end speech recognition, such as attention based approaches, is an emerging and attractive topic in recent years. It has achieved comparable performance with the traditional speech recognition framework. Because end-to-end approaches integrate acoustic and linguistic information into one model, the perturbation in the acoustic level such as acoustic noise, could be easily propagated to the linguistic level. Thus improving model robustness in real application environments for these end-to-end systems is crucial. In this paper, in order to make the attention based end-to-end model more robust against noises, we formulate regulation of the objective function with adversarial training examples. Particularly two adversarial regularization techniques, the fast gradient-sign method and the local distributional smoothness method, are explored to improve noise robustness. Experiments on two publicly available Chinese Mandarin corpora, AISHELL-1 and AISHELL-2, show that adversarial regularization is an effective approach to improve robustness against noises for our attention-based models. Specifically, we obtained 18.4 relative character error rate CER reduction on the AISHELL-1 noisy test set. Even on the clean test set, we showed 16.7 relative improvement. As the training set increases and covers more environmental varieties, our proposed methods remain effective despite that the improvement shrinks. Training on the large AISHELL-2 training corpus and testing on the various AISHELL-2 test sets, we achieved 7.0-12.2 relative error rate reduction. To our knowledge, this is the first successful application of adversarial regularization to sequence-to-sequence speech recognition systems.

源语言	英语
文章编号	3370726
页（从-至）	1826-1838
页数	13
期刊	IEEE/ACM Transactions on Audio Speech and Language Processing
卷	27
期	11
DOI	https://doi.org/10.1109/TASLP.2019.2933146
出版状态	已出版 - 11月 2019

访问文件

10.1109/TASLP.2019.2933146

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{a3a0140760824165bcc0521a97504a38,

title = "Adversarial regularization for attention based end-to-end robust speech recognition",

abstract = "End-to-end speech recognition, such as attention based approaches, is an emerging and attractive topic in recent years. It has achieved comparable performance with the traditional speech recognition framework. Because end-to-end approaches integrate acoustic and linguistic information into one model, the perturbation in the acoustic level such as acoustic noise, could be easily propagated to the linguistic level. Thus improving model robustness in real application environments for these end-to-end systems is crucial. In this paper, in order to make the attention based end-to-end model more robust against noises, we formulate regulation of the objective function with adversarial training examples. Particularly two adversarial regularization techniques, the fast gradient-sign method and the local distributional smoothness method, are explored to improve noise robustness. Experiments on two publicly available Chinese Mandarin corpora, AISHELL-1 and AISHELL-2, show that adversarial regularization is an effective approach to improve robustness against noises for our attention-based models. Specifically, we obtained 18.4 relative character error rate CER reduction on the AISHELL-1 noisy test set. Even on the clean test set, we showed 16.7 relative improvement. As the training set increases and covers more environmental varieties, our proposed methods remain effective despite that the improvement shrinks. Training on the large AISHELL-2 training corpus and testing on the various AISHELL-2 test sets, we achieved 7.0-12.2 relative error rate reduction. To our knowledge, this is the first successful application of adversarial regularization to sequence-to-sequence speech recognition systems.",

keywords = "Adversarial training, Attention, Cross entropy, Listen Attend and Spell, Sequence-to-sequence, Virtual adversarial training",

author = "Sining Sun and Pengcheng Guo and Lei Xie and Hwang, {Mei Yuh}",

year = "2019",

month = nov,

doi = "10.1109/TASLP.2019.2933146",

language = "英语",

volume = "27",

pages = "1826--1838",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "IEEE Advancing Technology for Humanity",

number = "11",

}

TY - JOUR

T1 - Adversarial regularization for attention based end-to-end robust speech recognition

AU - Sun, Sining

AU - Guo, Pengcheng

AU - Xie, Lei

AU - Hwang, Mei Yuh

PY - 2019/11

Y1 - 2019/11

N2 - End-to-end speech recognition, such as attention based approaches, is an emerging and attractive topic in recent years. It has achieved comparable performance with the traditional speech recognition framework. Because end-to-end approaches integrate acoustic and linguistic information into one model, the perturbation in the acoustic level such as acoustic noise, could be easily propagated to the linguistic level. Thus improving model robustness in real application environments for these end-to-end systems is crucial. In this paper, in order to make the attention based end-to-end model more robust against noises, we formulate regulation of the objective function with adversarial training examples. Particularly two adversarial regularization techniques, the fast gradient-sign method and the local distributional smoothness method, are explored to improve noise robustness. Experiments on two publicly available Chinese Mandarin corpora, AISHELL-1 and AISHELL-2, show that adversarial regularization is an effective approach to improve robustness against noises for our attention-based models. Specifically, we obtained 18.4 relative character error rate CER reduction on the AISHELL-1 noisy test set. Even on the clean test set, we showed 16.7 relative improvement. As the training set increases and covers more environmental varieties, our proposed methods remain effective despite that the improvement shrinks. Training on the large AISHELL-2 training corpus and testing on the various AISHELL-2 test sets, we achieved 7.0-12.2 relative error rate reduction. To our knowledge, this is the first successful application of adversarial regularization to sequence-to-sequence speech recognition systems.

AB - End-to-end speech recognition, such as attention based approaches, is an emerging and attractive topic in recent years. It has achieved comparable performance with the traditional speech recognition framework. Because end-to-end approaches integrate acoustic and linguistic information into one model, the perturbation in the acoustic level such as acoustic noise, could be easily propagated to the linguistic level. Thus improving model robustness in real application environments for these end-to-end systems is crucial. In this paper, in order to make the attention based end-to-end model more robust against noises, we formulate regulation of the objective function with adversarial training examples. Particularly two adversarial regularization techniques, the fast gradient-sign method and the local distributional smoothness method, are explored to improve noise robustness. Experiments on two publicly available Chinese Mandarin corpora, AISHELL-1 and AISHELL-2, show that adversarial regularization is an effective approach to improve robustness against noises for our attention-based models. Specifically, we obtained 18.4 relative character error rate CER reduction on the AISHELL-1 noisy test set. Even on the clean test set, we showed 16.7 relative improvement. As the training set increases and covers more environmental varieties, our proposed methods remain effective despite that the improvement shrinks. Training on the large AISHELL-2 training corpus and testing on the various AISHELL-2 test sets, we achieved 7.0-12.2 relative error rate reduction. To our knowledge, this is the first successful application of adversarial regularization to sequence-to-sequence speech recognition systems.

KW - Adversarial training

KW - Attention

KW - Cross entropy

KW - Listen Attend and Spell

KW - Sequence-to-sequence

KW - Virtual adversarial training

UR - http://www.scopus.com/inward/record.url?scp=85075011136&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2019.2933146

DO - 10.1109/TASLP.2019.2933146

M3 - 文章

AN - SCOPUS:85075011136

SN - 2329-9290

VL - 27

SP - 1826

EP - 1838

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

IS - 11

M1 - 3370726

ER -

Adversarial regularization for attention based end-to-end robust speech recognition

摘要

访问文件

其它文件与链接

指纹

引用此