Toward high-performance language-independent query-by-example spoken term detection for MediaEval 2015: Post-evaluation analysis

Cheung Chi Leung; Lei Wang; Haihua Xu; Jingyong Hou; Van Tung Pham; Hang Lv; Lei Xie; Xiong Xiao; Chongjia Ni; Bin Ma; Eng Siong Chng; Haizhou Li

doi:10.21437/Interspeech.2016-691

Toward high-performance language-independent query-by-example spoken term detection for MediaEval 2015: Post-evaluation analysis

Cheung Chi Leung, Lei Wang, Haihua Xu, Jingyong Hou, Van Tung Pham, Hang Lv, Lei Xie, Xiong Xiao, Chongjia Ni, Bin Ma, Eng Siong Chng, Haizhou Li

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

18 引用（Scopus）

摘要

This paper documents the significant components of a state-ofthe-art language-independent query-by-example spoken term detection system designed for the Query by Example Search on Speech Task (QUESST) in MediaEval 2015. We developed exact and partial matching DTW systems, and WFST based symbolic search systems to handle different types of search queries. To handle the noisy and reverberant speech in the task, we trained tokenizers using data augmented with different noise and reverberation conditions. Our postevaluation analysis showed that the phone boundary label provided by the improved tokenizers brings more accurate speech activity detection in DTW systems. We argue that acoustic condition mismatch is possibly a more important factor than language mismatch for obtaining consistent gain from stacked bottleneck features. Our post-evaluation system, involving a smaller number of component systems, can outperform our submitted systems, which performed the best for the task.

源语言	英语
页（从-至）	3703-3707
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	08-12-September-2016
DOI	https://doi.org/10.21437/Interspeech.2016-691
出版状态	已出版 - 2016
活动	17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, 美国期限: 8 9月 2016 → 16 9月 2016

访问文件

10.21437/Interspeech.2016-691

其它文件与链接

链接到 Scopus 的出版物

引用此

Leung, C. C., Wang, L., Xu, H., Hou, J., Pham, V. T., Lv, H., Xie, L., Xiao, X., Ni, C., Ma, B., Chng, E. S., & Li, H. (2016). Toward high-performance language-independent query-by-example spoken term detection for MediaEval 2015: Post-evaluation analysis. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 08-12-September-2016, 3703-3707. https://doi.org/10.21437/Interspeech.2016-691

@article{85dde2c31e9e4e5f83ce4922e702b9f5,

title = "Toward high-performance language-independent query-by-example spoken term detection for MediaEval 2015: Post-evaluation analysis",

abstract = "This paper documents the significant components of a state-ofthe-art language-independent query-by-example spoken term detection system designed for the Query by Example Search on Speech Task (QUESST) in MediaEval 2015. We developed exact and partial matching DTW systems, and WFST based symbolic search systems to handle different types of search queries. To handle the noisy and reverberant speech in the task, we trained tokenizers using data augmented with different noise and reverberation conditions. Our postevaluation analysis showed that the phone boundary label provided by the improved tokenizers brings more accurate speech activity detection in DTW systems. We argue that acoustic condition mismatch is possibly a more important factor than language mismatch for obtaining consistent gain from stacked bottleneck features. Our post-evaluation system, involving a smaller number of component systems, can outperform our submitted systems, which performed the best for the task.",

keywords = "Bottleneck features, Data augmentation, Dynamic time warping, Partial matching, Symbolic search",

author = "Leung, {Cheung Chi} and Lei Wang and Haihua Xu and Jingyong Hou and Pham, {Van Tung} and Hang Lv and Lei Xie and Xiong Xiao and Chongjia Ni and Bin Ma and Chng, {Eng Siong} and Haizhou Li",

note = "Publisher Copyright: Copyright {\textcopyright} 2016 ISCA.; 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 ; Conference date: 08-09-2016 Through 16-09-2016",

year = "2016",

doi = "10.21437/Interspeech.2016-691",

language = "英语",

volume = "08-12-September-2016",

pages = "3703--3707",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

Leung, CC, Wang, L, Xu, H, Hou, J, Pham, VT, Lv, H, Xie, L, Xiao, X, Ni, C, Ma, B, Chng, ES & Li, H 2016, 'Toward high-performance language-independent query-by-example spoken term detection for MediaEval 2015: Post-evaluation analysis', Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 卷 08-12-September-2016, 页码 3703-3707. https://doi.org/10.21437/Interspeech.2016-691

Toward high-performance language-independent query-by-example spoken term detection for MediaEval 2015: Post-evaluation analysis. / Leung, Cheung Chi; Wang, Lei; Xu, Haihua 等.
在: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 卷 08-12-September-2016, 2016, 页码 3703-3707.

科研成果: 期刊稿件 › 会议文章 › 同行评审

TY - JOUR

T1 - Toward high-performance language-independent query-by-example spoken term detection for MediaEval 2015

T2 - 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016

AU - Leung, Cheung Chi

AU - Wang, Lei

AU - Xu, Haihua

AU - Hou, Jingyong

AU - Pham, Van Tung

AU - Lv, Hang

AU - Xie, Lei

AU - Xiao, Xiong

AU - Ni, Chongjia

AU - Ma, Bin

AU - Chng, Eng Siong

AU - Li, Haizhou

PY - 2016

Y1 - 2016

N2 - This paper documents the significant components of a state-ofthe-art language-independent query-by-example spoken term detection system designed for the Query by Example Search on Speech Task (QUESST) in MediaEval 2015. We developed exact and partial matching DTW systems, and WFST based symbolic search systems to handle different types of search queries. To handle the noisy and reverberant speech in the task, we trained tokenizers using data augmented with different noise and reverberation conditions. Our postevaluation analysis showed that the phone boundary label provided by the improved tokenizers brings more accurate speech activity detection in DTW systems. We argue that acoustic condition mismatch is possibly a more important factor than language mismatch for obtaining consistent gain from stacked bottleneck features. Our post-evaluation system, involving a smaller number of component systems, can outperform our submitted systems, which performed the best for the task.

AB - This paper documents the significant components of a state-ofthe-art language-independent query-by-example spoken term detection system designed for the Query by Example Search on Speech Task (QUESST) in MediaEval 2015. We developed exact and partial matching DTW systems, and WFST based symbolic search systems to handle different types of search queries. To handle the noisy and reverberant speech in the task, we trained tokenizers using data augmented with different noise and reverberation conditions. Our postevaluation analysis showed that the phone boundary label provided by the improved tokenizers brings more accurate speech activity detection in DTW systems. We argue that acoustic condition mismatch is possibly a more important factor than language mismatch for obtaining consistent gain from stacked bottleneck features. Our post-evaluation system, involving a smaller number of component systems, can outperform our submitted systems, which performed the best for the task.

KW - Bottleneck features

KW - Data augmentation

KW - Dynamic time warping

KW - Partial matching

KW - Symbolic search

UR - http://www.scopus.com/inward/record.url?scp=84994365926&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2016-691

DO - 10.21437/Interspeech.2016-691

M3 - 会议文章

AN - SCOPUS:84994365926

SN - 2308-457X

VL - 08-12-September-2016

SP - 3703

EP - 3707

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Y2 - 8 September 2016 through 16 September 2016

ER -

Toward high-performance language-independent query-by-example spoken term detection for MediaEval 2015: Post-evaluation analysis

摘要

访问文件

其它文件与链接

指纹

引用此