TY - JOUR
T1 - Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling
T2 - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
AU - Chen, Hongjie
AU - Leung, Cheung Chi
AU - Xie, Lei
AU - Ma, Bin
AU - Li, Haizhou
N1 - Publisher Copyright:
Copyright © 2015 ISCA.
PY - 2015
Y1 - 2015
N2 - We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model per- forms unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complex- ity (i.e. The number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.
AB - We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model per- forms unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complex- ity (i.e. The number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.
KW - ABX discrimination
KW - Acoustic unit discovery
KW - Bayesian nonparametrics
KW - Gaussian posteriorgrams
KW - Gibbs sampling
UR - http://www.scopus.com/inward/record.url?scp=84959108948&partnerID=8YFLogxK
M3 - 会议文章
AN - SCOPUS:84959108948
SN - 2308-457X
VL - 2015-January
SP - 3189
EP - 3193
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 6 September 2015 through 10 September 2015
ER -