Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation

Yougen Yuan, Cheung Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

16 引用 (Scopus)

摘要

We propose a framework to learn a frame-level speech representation in a scenario where no manual transcription is available. Our framework is based on pairwise learning using bottleneck features (BNFs). Initial frame-level features are extracted from a bottleneck-shaped multilingual deep neural network (DNN) which is trained with unsupervised phoneme-like labels. Word-like pairs are discovered in the untranscribed speech using the initial features, and frame alignment is performed on each word-like speech pair. The matching frame pairs are used as input-output to train another DNN with the mean square error (MSE) loss function. The final frame-level features are extracted from an internal hidden layer of MSE-based DNN. Our pairwise learned feature representation is evaluated on the ZeroSpeech 2017 challenge. The experiments show that pairwise learning improves phoneme discrimination in 10s and 120s test conditions. We find that it is important to use BNFs as initial features when pairwise learning is performed. With more word pairs obtained from the Switchboard corpus and its manual transcription, the phoneme discrimination of three languages in the evaluation data can further be improved despite data mismatch.

源语言英语
主期刊名2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
734-739
页数6
ISBN(电子版)9781509047888
DOI
出版状态已出版 - 2 7月 2017
活动2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Okinawa, 日本
期限: 16 12月 201720 12月 2017

出版系列

姓名2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
2018-January

会议

会议2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017
国家/地区日本
Okinawa
时期16/12/1720/12/17

指纹

探究 'Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation' 的科研主题。它们共同构成独一无二的指纹。

引用此