Wake word detection with alignment-free lattice-free MMI

Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input. We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data, and to use it in on-line applications: (i) we remove the prerequisite of frame-level alignments in the LF-MMI training algorithm, permitting the use of un-transcribed training examples that are annotated only for the presence/absence of the wake word; (ii) we show that the classical keyword/filler model must be supplemented with an explicit non-speech (silence) model for good performance; (iii) we present an FST-based decoder to perform online detection. We evaluate our methods on two real data sets, showing 50%-90% reduction in false rejection rates at prespecified false alarm rates over the best previously published figures, and re-validate them on a third (large) data set.

Original languageEnglish
Title of host publicationInterspeech 2020
PublisherInternational Speech Communication Association
Pages4258-4262
Number of pages5
ISBN (Print)9781713820697
DOIs
StatePublished - 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Keywords

  • Alignment free
  • Lattice-free MMI
  • Wake word detection

Fingerprint

Dive into the research topics of 'Wake word detection with alignment-free lattice-free MMI'. Together they form a unique fingerprint.

Cite this