Verifying Deep Keyword Spotting Detection with Acoustic Word Embeddings

Yougen Yuan, Zhiqiang Lv, Shen Huang, Lei Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

In this paper, in order to improve keyword spotting (KWS) performance in a live broadcast scenario, we propose to use a template matching method based on acoustic word embeddings (AWE) as the second stage to verify the detection from the Deep KWS system. AWEs are obtained via a deep bidirectional long short-Term memory (BLSTM) network trained using limited positive and negative keyword candidates, which aims to encode variable-length keyword candidates into fixed-dimensional vectors with reasonable discriminative ability. Learning AWEs takes a combination of three specifically-designed losses: The triplet and reversed triplet losses try to keep same keyword candidates closer and different keyword candidates farther, while the hinge loss is to set a fixed threshold to distinguish all positive and negative keyword candidates. During keyword verification, calibration scores are used to reduce the bias between different templates for different keyword candidates. Experiments show that adding AWE-based keyword verification to Deep KWS achieves 5.6% relative accuracy improvement; the hinge loss brings additional 5.5% relative gain and the final accuracy climbs to 0.775 by using calibration scores.

Original languageEnglish
Title of host publication2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages613-620
Number of pages8
ISBN (Electronic)9781728103068
DOIs
StatePublished - Dec 2019
Event2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Singapore, Singapore
Duration: 15 Dec 201918 Dec 2019

Publication series

Name2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

Conference

Conference2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
Country/TerritorySingapore
CitySingapore
Period15/12/1918/12/19

Keywords

  • acoustic word embeddings
  • calibration scores
  • hinge loss
  • Query-by-example
  • spotting

Fingerprint

Dive into the research topics of 'Verifying Deep Keyword Spotting Detection with Acoustic Word Embeddings'. Together they form a unique fingerprint.

Cite this