Unsupervised Bottleneck features for low-resource query-by-example spoken term detection

Hongjie Chen, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

Research output: Contribution to journalConference articlepeer-review

40 Scopus citations

Abstract

We propose a framework which ports Dirichlet Gaussian mixture model (DPGMM) based labels to deep neural network (DNN). The DNN trained using the unsupervised labels is used to extract a low-dimensional unsupervised speech representation, named as unsupervised bottleneck features (uBNFs), which capture considerable information for sound cluster discrimination. We investigate the performance of uBNF in queryby-example spoken term detection (QbE-STD) on the TIMIT English speech corpus. Our uBNF performs comparably with the cross-lingual bottleneck features (BNFs) extracted from a DNN trained using 171 hours of transcribed telephone speech in another language (Mandarin Chinese). With the score fusion of uBNFs and cross-lingual BNFs, we gain about 10% relative improvement in terms of mean average precision (MAP) comparing with the cross-lingual BNFs. We also study the performance of the framework with different input features and different lengths of temporal context.

Original languageEnglish
Pages (from-to)923-927
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
StatePublished - 2016
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: 8 Sep 201616 Sep 2016

Keywords

  • Bottleneck feature
  • Dirichlet process Gaussian mixture model
  • Low-resource speech processing
  • Spoken term detection
  • Unsupervised feature learning

Fingerprint

Dive into the research topics of 'Unsupervised Bottleneck features for low-resource query-by-example spoken term detection'. Together they form a unique fingerprint.

Cite this