Multilingual bottle-neck feature learning from untranscribed speech

Hongjie Chen; Cheung Chi Leung; Lei Xie; Bin Ma; Haizhou Li

doi:10.1109/ASRU.2017.8269009

Multilingual bottle-neck feature learning from untranscribed speech

Hongjie Chen, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

27 Scopus citations

Abstract

We propose to learn a low-dimensional feature representation for multiple languages without access to their manual transcription. The multilingual features are extracted from a shared bottleneck layer of a multi-task learning deep neural network which is trained using un-supervised phoneme-like labels. The unsupervised phoneme-like labels are obtained from language-dependent Dirichlet process Gaussian mixture models (DPGMMs). Vocal tract length normalization (VTLN) is applied to mel-frequency cepstral coefficients to reduce talker variation when DPGMMs are trained. The proposed features are evaluated using the ABX phoneme discriminability test in the Zero Resource Speech Challenge 2017. In the experiments, we show that the proposed features perform well across different languages, and they consistently outperform our previously proposed DPGMM posteriorgrams which topped the performance in the same challenge in 2015.

Original language	English
Title of host publication	2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	727-733
Number of pages	7
ISBN (Electronic)	9781509047888
DOIs	https://doi.org/10.1109/ASRU.2017.8269009
State	Published - 2 Jul 2017
Event	2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Okinawa, Japan Duration: 16 Dec 2017 → 20 Dec 2017

Publication series

Name	2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
Volume	2018-January

Conference

Conference	2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017
Country/Territory	Japan
City	Okinawa
Period	16/12/17 → 20/12/17

Keywords

low/zero-resource
Multi-task learning
multilingual feature
unsupervised feature learning

Access to Document

10.1109/ASRU.2017.8269009

Cite this

Chen, H., Leung, C. C., Xie, L., Ma, B., & Li, H. (2017). Multilingual bottle-neck feature learning from untranscribed speech. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings (pp. 727-733). (2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings; Vol. 2018-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2017.8269009

Chen, Hongjie ; Leung, Cheung Chi ; Xie, Lei et al. / Multilingual bottle-neck feature learning from untranscribed speech. 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 727-733 (2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings).

@inproceedings{334a483ae5b240169a30e0ea5aa0f632,

title = "Multilingual bottle-neck feature learning from untranscribed speech",

abstract = "We propose to learn a low-dimensional feature representation for multiple languages without access to their manual transcription. The multilingual features are extracted from a shared bottleneck layer of a multi-task learning deep neural network which is trained using un-supervised phoneme-like labels. The unsupervised phoneme-like labels are obtained from language-dependent Dirichlet process Gaussian mixture models (DPGMMs). Vocal tract length normalization (VTLN) is applied to mel-frequency cepstral coefficients to reduce talker variation when DPGMMs are trained. The proposed features are evaluated using the ABX phoneme discriminability test in the Zero Resource Speech Challenge 2017. In the experiments, we show that the proposed features perform well across different languages, and they consistently outperform our previously proposed DPGMM posteriorgrams which topped the performance in the same challenge in 2015.",

keywords = "low/zero-resource, Multi-task learning, multilingual feature, unsupervised feature learning",

author = "Hongjie Chen and Leung, {Cheung Chi} and Lei Xie and Bin Ma and Haizhou Li",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 ; Conference date: 16-12-2017 Through 20-12-2017",

year = "2017",

month = jul,

day = "2",

doi = "10.1109/ASRU.2017.8269009",

language = "英语",

series = "2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "727--733",

booktitle = "2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings",

}

Chen, H, Leung, CC, Xie, L, Ma, B & Li, H 2017, Multilingual bottle-neck feature learning from untranscribed speech. in 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings. 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings, vol. 2018-January, Institute of Electrical and Electronics Engineers Inc., pp. 727-733, 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, 16/12/17. https://doi.org/10.1109/ASRU.2017.8269009

Multilingual bottle-neck feature learning from untranscribed speech. / Chen, Hongjie; Leung, Cheung Chi; Xie, Lei et al.
2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 727-733 (2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings; Vol. 2018-January).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Multilingual bottle-neck feature learning from untranscribed speech

AU - Chen, Hongjie

AU - Leung, Cheung Chi

AU - Xie, Lei

AU - Ma, Bin

AU - Li, Haizhou

PY - 2017/7/2

Y1 - 2017/7/2

N2 - We propose to learn a low-dimensional feature representation for multiple languages without access to their manual transcription. The multilingual features are extracted from a shared bottleneck layer of a multi-task learning deep neural network which is trained using un-supervised phoneme-like labels. The unsupervised phoneme-like labels are obtained from language-dependent Dirichlet process Gaussian mixture models (DPGMMs). Vocal tract length normalization (VTLN) is applied to mel-frequency cepstral coefficients to reduce talker variation when DPGMMs are trained. The proposed features are evaluated using the ABX phoneme discriminability test in the Zero Resource Speech Challenge 2017. In the experiments, we show that the proposed features perform well across different languages, and they consistently outperform our previously proposed DPGMM posteriorgrams which topped the performance in the same challenge in 2015.

AB - We propose to learn a low-dimensional feature representation for multiple languages without access to their manual transcription. The multilingual features are extracted from a shared bottleneck layer of a multi-task learning deep neural network which is trained using un-supervised phoneme-like labels. The unsupervised phoneme-like labels are obtained from language-dependent Dirichlet process Gaussian mixture models (DPGMMs). Vocal tract length normalization (VTLN) is applied to mel-frequency cepstral coefficients to reduce talker variation when DPGMMs are trained. The proposed features are evaluated using the ABX phoneme discriminability test in the Zero Resource Speech Challenge 2017. In the experiments, we show that the proposed features perform well across different languages, and they consistently outperform our previously proposed DPGMM posteriorgrams which topped the performance in the same challenge in 2015.

KW - low/zero-resource

KW - Multi-task learning

KW - multilingual feature

KW - unsupervised feature learning

UR - http://www.scopus.com/inward/record.url?scp=85050558543&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2017.8269009

DO - 10.1109/ASRU.2017.8269009

M3 - 会议稿件

AN - SCOPUS:85050558543

T3 - 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings

SP - 727

EP - 733

BT - 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017

Y2 - 16 December 2017 through 20 December 2017

ER -

Chen H, Leung CC, Xie L, Ma B, Li H. Multilingual bottle-neck feature learning from untranscribed speech. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 727-733. (2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings). doi: 10.1109/ASRU.2017.8269009

Multilingual bottle-neck feature learning from untranscribed speech

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this