Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study

Hongjie Chen; Cheung Chi Leung; Lei Xie; Bin Ma; Haizhou Li

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study

Hongjie Chen, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

65 Scopus citations

Abstract

We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model per- forms unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complex- ity (i.e. The number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.

Original language	English
Pages (from-to)	3189-3193
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2015-January
State	Published - 2015
Event	16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany Duration: 6 Sep 2015 → 10 Sep 2015

Keywords

ABX discrimination
Acoustic unit discovery
Bayesian nonparametrics
Gaussian posteriorgrams
Gibbs sampling

Cite this

@article{cfc1bbe54a7d4b4bbb75555fb3507ed4,

title = "Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study",

abstract = "We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model per- forms unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complex- ity (i.e. The number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.",

keywords = "ABX discrimination, Acoustic unit discovery, Bayesian nonparametrics, Gaussian posteriorgrams, Gibbs sampling",

author = "Hongjie Chen and Leung, {Cheung Chi} and Lei Xie and Bin Ma and Haizhou Li",

note = "Publisher Copyright: Copyright {\textcopyright} 2015 ISCA.; 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 ; Conference date: 06-09-2015 Through 10-09-2015",

year = "2015",

language = "英语",

volume = "2015-January",

pages = "3189--3193",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study. / Chen, Hongjie; Leung, Cheung Chi; Xie, Lei et al.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2015-January, 2015, p. 3189-3193.

Research output: Contribution to journal › Conference article › peer-review

TY - JOUR

T1 - Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling

T2 - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015

AU - Chen, Hongjie

AU - Leung, Cheung Chi

AU - Xie, Lei

AU - Ma, Bin

AU - Li, Haizhou

PY - 2015

Y1 - 2015

N2 - We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model per- forms unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complex- ity (i.e. The number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.

AB - We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model per- forms unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complex- ity (i.e. The number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.

KW - ABX discrimination

KW - Acoustic unit discovery

KW - Bayesian nonparametrics

KW - Gaussian posteriorgrams

KW - Gibbs sampling

UR - http://www.scopus.com/inward/record.url?scp=84959108948&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:84959108948

SN - 2308-457X

VL - 2015-January

SP - 3189

EP - 3193

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Y2 - 6 September 2015 through 10 September 2015

ER -

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: A feasibility study

Abstract

Keywords

Other files and links

Fingerprint

Cite this