Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection

Peng Yang; Cheung Chi Leung; Lei Xie; Bin Ma; Haizhou Li

Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection

Peng Yang, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

22 Scopus citations

Abstract

We investigate the use of intrinsic spectral analysis (ISA) for query-by-example spoken term detection (QbE-STD). In the task, spoken queries and test utterances in an audio archive are converted to ISA features, and dynamic time warping is applied to match the feature sequence in each query with those in test utterances. Motivated by manifold learning, ISA has been pro- posed to recover from untranscribed utterances a set of nonlin- ear basis functions for the speech manifold, and shown with improved phonetic separability and inherent speaker indepen- dence. Due to the coarticulation phenomenon in speech, we propose to use temporal context information to obtain the ISA features. Gaussian posteriorgram, as an efficient acoustic rep- resentation usually used in QbE-STD, is considered a baseline feature. Experimental results on the TIMIT speech corpus show that the ISA features can provide a relative 13.5% improvement in mean average precision over the baseline features, when the temporal context information is used.

Original language	English
Pages (from-to)	1722-1726
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
State	Published - 2014
Event	15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore Duration: 14 Sep 2014 → 18 Sep 2014

Keywords

Dynamic time warping
Gaussian posteriorgram
Intrinsic spectral analysis
Spoken term detection

Cite this

@article{f0fddefecd2a4a4785e34b9d743e9960,

title = "Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection",

abstract = "We investigate the use of intrinsic spectral analysis (ISA) for query-by-example spoken term detection (QbE-STD). In the task, spoken queries and test utterances in an audio archive are converted to ISA features, and dynamic time warping is applied to match the feature sequence in each query with those in test utterances. Motivated by manifold learning, ISA has been pro- posed to recover from untranscribed utterances a set of nonlin- ear basis functions for the speech manifold, and shown with improved phonetic separability and inherent speaker indepen- dence. Due to the coarticulation phenomenon in speech, we propose to use temporal context information to obtain the ISA features. Gaussian posteriorgram, as an efficient acoustic rep- resentation usually used in QbE-STD, is considered a baseline feature. Experimental results on the TIMIT speech corpus show that the ISA features can provide a relative 13.5% improvement in mean average precision over the baseline features, when the temporal context information is used.",

keywords = "Dynamic time warping, Gaussian posteriorgram, Intrinsic spectral analysis, Spoken term detection",

author = "Peng Yang and Leung, {Cheung Chi} and Lei Xie and Bin Ma and Haizhou Li",

note = "Publisher Copyright: Copyright {\textcopyright} 2014 ISCA.; 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 ; Conference date: 14-09-2014 Through 18-09-2014",

year = "2014",

language = "英语",

pages = "1722--1726",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection

AU - Yang, Peng

AU - Leung, Cheung Chi

AU - Xie, Lei

AU - Ma, Bin

AU - Li, Haizhou

PY - 2014

Y1 - 2014

N2 - We investigate the use of intrinsic spectral analysis (ISA) for query-by-example spoken term detection (QbE-STD). In the task, spoken queries and test utterances in an audio archive are converted to ISA features, and dynamic time warping is applied to match the feature sequence in each query with those in test utterances. Motivated by manifold learning, ISA has been pro- posed to recover from untranscribed utterances a set of nonlin- ear basis functions for the speech manifold, and shown with improved phonetic separability and inherent speaker indepen- dence. Due to the coarticulation phenomenon in speech, we propose to use temporal context information to obtain the ISA features. Gaussian posteriorgram, as an efficient acoustic rep- resentation usually used in QbE-STD, is considered a baseline feature. Experimental results on the TIMIT speech corpus show that the ISA features can provide a relative 13.5% improvement in mean average precision over the baseline features, when the temporal context information is used.

AB - We investigate the use of intrinsic spectral analysis (ISA) for query-by-example spoken term detection (QbE-STD). In the task, spoken queries and test utterances in an audio archive are converted to ISA features, and dynamic time warping is applied to match the feature sequence in each query with those in test utterances. Motivated by manifold learning, ISA has been pro- posed to recover from untranscribed utterances a set of nonlin- ear basis functions for the speech manifold, and shown with improved phonetic separability and inherent speaker indepen- dence. Due to the coarticulation phenomenon in speech, we propose to use temporal context information to obtain the ISA features. Gaussian posteriorgram, as an efficient acoustic rep- resentation usually used in QbE-STD, is considered a baseline feature. Experimental results on the TIMIT speech corpus show that the ISA features can provide a relative 13.5% improvement in mean average precision over the baseline features, when the temporal context information is used.

KW - Dynamic time warping

KW - Gaussian posteriorgram

KW - Intrinsic spectral analysis

KW - Spoken term detection

UR - http://www.scopus.com/inward/record.url?scp=84910048259&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:84910048259

SN - 2308-457X

SP - 1722

EP - 1726

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014

Y2 - 14 September 2014 through 18 September 2014

ER -

Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection

Abstract

Keywords

Other files and links

Fingerprint

Cite this