Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection

Peng Yang, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

Research output: Contribution to journalConference articlepeer-review

22 Scopus citations

Abstract

We investigate the use of intrinsic spectral analysis (ISA) for query-by-example spoken term detection (QbE-STD). In the task, spoken queries and test utterances in an audio archive are converted to ISA features, and dynamic time warping is applied to match the feature sequence in each query with those in test utterances. Motivated by manifold learning, ISA has been pro- posed to recover from untranscribed utterances a set of nonlin- ear basis functions for the speech manifold, and shown with improved phonetic separability and inherent speaker indepen- dence. Due to the coarticulation phenomenon in speech, we propose to use temporal context information to obtain the ISA features. Gaussian posteriorgram, as an efficient acoustic rep- resentation usually used in QbE-STD, is considered a baseline feature. Experimental results on the TIMIT speech corpus show that the ISA features can provide a relative 13.5% improvement in mean average precision over the baseline features, when the temporal context information is used.

Keywords

  • Dynamic time warping
  • Gaussian posteriorgram
  • Intrinsic spectral analysis
  • Spoken term detection

Fingerprint

Dive into the research topics of 'Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection'. Together they form a unique fingerprint.

Cite this