Multilingual bottle-neck feature learning from untranscribed speech

Hongjie Chen, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

27 Scopus citations

Abstract

We propose to learn a low-dimensional feature representation for multiple languages without access to their manual transcription. The multilingual features are extracted from a shared bottleneck layer of a multi-task learning deep neural network which is trained using un-supervised phoneme-like labels. The unsupervised phoneme-like labels are obtained from language-dependent Dirichlet process Gaussian mixture models (DPGMMs). Vocal tract length normalization (VTLN) is applied to mel-frequency cepstral coefficients to reduce talker variation when DPGMMs are trained. The proposed features are evaluated using the ABX phoneme discriminability test in the Zero Resource Speech Challenge 2017. In the experiments, we show that the proposed features perform well across different languages, and they consistently outperform our previously proposed DPGMM posteriorgrams which topped the performance in the same challenge in 2015.

Original languageEnglish
Title of host publication2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages727-733
Number of pages7
ISBN (Electronic)9781509047888
DOIs
StatePublished - 2 Jul 2017
Event2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Okinawa, Japan
Duration: 16 Dec 201720 Dec 2017

Publication series

Name2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
Volume2018-January

Conference

Conference2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017
Country/TerritoryJapan
CityOkinawa
Period16/12/1720/12/17

Keywords

  • low/zero-resource
  • Multi-task learning
  • multilingual feature
  • unsupervised feature learning

Fingerprint

Dive into the research topics of 'Multilingual bottle-neck feature learning from untranscribed speech'. Together they form a unique fingerprint.

Cite this