Abstract
We propose a novel technique that learns a low-dimensional feature representation from unlabeled data of a target language, and labeled data from a nontarget language. The technique is studied as a solution to query-by-example spoken term detection (QbE-STD) for a low-resource language. We extract low-dimensional features from a bottle-neck layer of a multitask deep neural network, which is jointly trained with speech data from the low-resource target language and resource-rich nontarget language. The proposed feature learning technique aims to extract acoustic features that offer phonetic discriminability. It explores a new way of leveraging cross-lingual speech data to overcome the resource limitation in the target language. We conduct QbE-STD experiments using the dynamic time warping distance of the multitask bottle-neck features between the query and the search database. The QbE-STD process does not rely on an automatic speech recognition pipeline of the target language. We validate the effectiveness of multitask feature learning through a series of comparative experiments.
| Original language | English |
|---|---|
| Article number | 8070974 |
| Pages (from-to) | 1329-1339 |
| Number of pages | 11 |
| Journal | IEEE Journal on Selected Topics in Signal Processing |
| Volume | 11 |
| Issue number | 8 |
| DOIs | |
| State | Published - Dec 2017 |
Keywords
- bottle-neck feature
- multi-task learning
- Query-by-example
- spoken term detection
Fingerprint
Dive into the research topics of 'Multitask Feature Learning for Low-Resource Query-by-Example Spoken Term Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver