Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing

Chenggang Mi; Lei Xie; Yanning Zhang

doi:10.1016/j.neunet.2022.01.016

Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing

Chenggang Mi, Lei Xie, Yanning Zhang

School of Computer Science

Xi'an International Studies University

Research output: Contribution to journal › Article › peer-review

26 Scopus citations

Abstract

High quality end-to-end speech translation model relies on a large scale of speech-to-text training data, which is usually scarce or even unavailable for some low-resource language pairs. To overcome this, we propose a target-side data augmentation method for low-resource language speech translation. In particular, we first generate large-scale target-side paraphrases based on a paraphrase generation model which incorporates several statistical machine translation (SMT) features and the commonly used recurrent neural network (RNN) feature. Then, a filtering model which consists of semantic similarity and speech–word pair co-occurrence was proposed to select the highest scoring source speech–target paraphrase pairs from candidates. Experimental results on English, Arabic, German, Latvian, Estonian, Slovenian and Swedish paraphrase generation show that the proposed method achieves significant and consistent improvements over several strong baseline models on PPDB datasets (http://paraphrase.org/). To introduce the results of paraphrase generation into the low-resource speech translation, we propose two strategies: audio–text pairs recombination and multiple references training. Experimental results show that the speech translation models trained on new audio–text datasets which combines the paraphrase generation results lead to substantial improvements over baselines, especially on low-resource languages.

Original language	English
Pages (from-to)	194-205
Number of pages	12
Journal	Neural Networks
Volume	148
DOIs	https://doi.org/10.1016/j.neunet.2022.01.016
State	Published - Apr 2022

Keywords

Data augmentation
Paraphrasing
Speech translation

Access to Document

10.1016/j.neunet.2022.01.016

Cite this

@article{6fc26352fd454785aafc2a38b6920c71,

title = "Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing",

abstract = "High quality end-to-end speech translation model relies on a large scale of speech-to-text training data, which is usually scarce or even unavailable for some low-resource language pairs. To overcome this, we propose a target-side data augmentation method for low-resource language speech translation. In particular, we first generate large-scale target-side paraphrases based on a paraphrase generation model which incorporates several statistical machine translation (SMT) features and the commonly used recurrent neural network (RNN) feature. Then, a filtering model which consists of semantic similarity and speech–word pair co-occurrence was proposed to select the highest scoring source speech–target paraphrase pairs from candidates. Experimental results on English, Arabic, German, Latvian, Estonian, Slovenian and Swedish paraphrase generation show that the proposed method achieves significant and consistent improvements over several strong baseline models on PPDB datasets (http://paraphrase.org/). To introduce the results of paraphrase generation into the low-resource speech translation, we propose two strategies: audio–text pairs recombination and multiple references training. Experimental results show that the speech translation models trained on new audio–text datasets which combines the paraphrase generation results lead to substantial improvements over baselines, especially on low-resource languages.",

keywords = "Data augmentation, Paraphrasing, Speech translation",

author = "Chenggang Mi and Lei Xie and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd",

year = "2022",

month = apr,

doi = "10.1016/j.neunet.2022.01.016",

language = "英语",

volume = "148",

pages = "194--205",

journal = "Neural Networks",

issn = "0893-6080",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing

AU - Mi, Chenggang

AU - Xie, Lei

AU - Zhang, Yanning

PY - 2022/4

Y1 - 2022/4

N2 - High quality end-to-end speech translation model relies on a large scale of speech-to-text training data, which is usually scarce or even unavailable for some low-resource language pairs. To overcome this, we propose a target-side data augmentation method for low-resource language speech translation. In particular, we first generate large-scale target-side paraphrases based on a paraphrase generation model which incorporates several statistical machine translation (SMT) features and the commonly used recurrent neural network (RNN) feature. Then, a filtering model which consists of semantic similarity and speech–word pair co-occurrence was proposed to select the highest scoring source speech–target paraphrase pairs from candidates. Experimental results on English, Arabic, German, Latvian, Estonian, Slovenian and Swedish paraphrase generation show that the proposed method achieves significant and consistent improvements over several strong baseline models on PPDB datasets (http://paraphrase.org/). To introduce the results of paraphrase generation into the low-resource speech translation, we propose two strategies: audio–text pairs recombination and multiple references training. Experimental results show that the speech translation models trained on new audio–text datasets which combines the paraphrase generation results lead to substantial improvements over baselines, especially on low-resource languages.

AB - High quality end-to-end speech translation model relies on a large scale of speech-to-text training data, which is usually scarce or even unavailable for some low-resource language pairs. To overcome this, we propose a target-side data augmentation method for low-resource language speech translation. In particular, we first generate large-scale target-side paraphrases based on a paraphrase generation model which incorporates several statistical machine translation (SMT) features and the commonly used recurrent neural network (RNN) feature. Then, a filtering model which consists of semantic similarity and speech–word pair co-occurrence was proposed to select the highest scoring source speech–target paraphrase pairs from candidates. Experimental results on English, Arabic, German, Latvian, Estonian, Slovenian and Swedish paraphrase generation show that the proposed method achieves significant and consistent improvements over several strong baseline models on PPDB datasets (http://paraphrase.org/). To introduce the results of paraphrase generation into the low-resource speech translation, we propose two strategies: audio–text pairs recombination and multiple references training. Experimental results show that the speech translation models trained on new audio–text datasets which combines the paraphrase generation results lead to substantial improvements over baselines, especially on low-resource languages.

KW - Data augmentation

KW - Paraphrasing

KW - Speech translation

UR - http://www.scopus.com/inward/record.url?scp=85124238365&partnerID=8YFLogxK

U2 - 10.1016/j.neunet.2022.01.016

DO - 10.1016/j.neunet.2022.01.016

M3 - 文章

C2 - 35151006

AN - SCOPUS:85124238365

SN - 0893-6080

VL - 148

SP - 194

EP - 205

JO - Neural Networks

JF - Neural Networks

ER -

Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this