Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study

Hongjie Chen; Cheung Chi Leung; Lei Xie; Bin Ma; Haizhou Li

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study

Hongjie Chen, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

65 引用（Scopus）

摘要

We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model per- forms unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complex- ity (i.e. The number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.

源语言	英语
页（从-至）	3189-3193
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2015-January
出版状态	已出版 - 2015
活动	16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, 德国期限: 6 9月 2015 → 10 9月 2015

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{cfc1bbe54a7d4b4bbb75555fb3507ed4,

title = "Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study",

abstract = "We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model per- forms unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complex- ity (i.e. The number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.",

keywords = "ABX discrimination, Acoustic unit discovery, Bayesian nonparametrics, Gaussian posteriorgrams, Gibbs sampling",

author = "Hongjie Chen and Leung, {Cheung Chi} and Lei Xie and Bin Ma and Haizhou Li",

note = "Publisher Copyright: Copyright {\textcopyright} 2015 ISCA.; 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 ; Conference date: 06-09-2015 Through 10-09-2015",

year = "2015",

language = "英语",

volume = "2015-January",

pages = "3189--3193",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study. / Chen, Hongjie; Leung, Cheung Chi; Xie, Lei 等.
在: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 卷 2015-January, 2015, 页码 3189-3193.

科研成果: 期刊稿件 › 会议文章 › 同行评审

TY - JOUR

T1 - Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling

T2 - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015

AU - Chen, Hongjie

AU - Leung, Cheung Chi

AU - Xie, Lei

AU - Ma, Bin

AU - Li, Haizhou

PY - 2015

Y1 - 2015

N2 - We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model per- forms unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complex- ity (i.e. The number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.

AB - We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model per- forms unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complex- ity (i.e. The number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.

KW - ABX discrimination

KW - Acoustic unit discovery

KW - Bayesian nonparametrics

KW - Gaussian posteriorgrams

KW - Gibbs sampling

UR - http://www.scopus.com/inward/record.url?scp=84959108948&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:84959108948

SN - 2308-457X

VL - 2015-January

SP - 3189

EP - 3193

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Y2 - 6 September 2015 through 10 September 2015

ER -

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study

摘要

其它文件与链接

指纹

引用此