Approximate search of audio queries by using DTW with phone time boundary and data augmentation

Haikua Xu, Jingyong Hou, Xiong Xiao, Van Tung Pham, Cheung Chi Leung, Lei Wang, Van Hai Do, Hang Lv, Lei Xie, Bin Ma, Eng Siong Chng, Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

Dynamic Time Warping (DTW) is widely used in language independent query-by-example (QbE) spoken term detection (STD) tasks due to its high performance. However, there are two limitations of DTW based template matching, 1) it is not straightforward to perform approximate match of audio queries; 2) DTW is sensitive to the mismatch of signal conditions between the query and the speech search data. To allow approximate search, we propose a partial template matching strategy using phone time boundary information generated by a phone recognizer. To have more invariant representation of audio signals, we use bottleneck features (BNF) as the input of DTW. The BNF network is trained from augmented data, which is generated by adding reverberation and additive noises to the clean training data. Experimental results on QUESST 2015 task shows the effectiveness of the proposed methods for QbE-STD when the queries and search data are both distorted by reverberation and noises.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6030-6034
Number of pages5
ISBN (Electronic)9781479999880
DOIs
StatePublished - 18 May 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 20 Mar 201625 Mar 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Conference

Conference41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Country/TerritoryChina
CityShanghai
Period20/03/1625/03/16

Keywords

  • data augmentation
  • DTW
  • partial matching
  • Query-by-example
  • spoken term detection

Fingerprint

Dive into the research topics of 'Approximate search of audio queries by using DTW with phone time boundary and data augmentation'. Together they form a unique fingerprint.

Cite this