Voice conversion using bayesian analysis and dynamic kernel features

Na Li, Xiangyang Zeng, Yu Qiao, Zhifeng Li

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

When the training utterances are sparse, the voice conversion method based on Mixture of Probabilistic Linear Regressions is subjected to overfitting problem. To address that case, we adopt dynamic kernel features to replace the cepstrum features of the original speaker and estimate the transformation parameters in sense of Maximizing a Posterior with Bayesian inference. First, the features of the original speaker are converted into dynamic kernel features by kernel transformation. Then the prior information of the transformation parameters is introduced. Finally, according to different assumptions about conversion error, we propose two different methods to estimate the transformation parameters. Compared to MPLR, the proposed method achieves 4.25% relative decrease on the average cepstrum distortion in objective evaluations and obtains higher score about naturalness and similarity in subjective evaluations. Experimental results indicate that the proposed method can alleviate the overfitting problem.

Original languageEnglish
Pages (from-to)455-461
Number of pages7
JournalShengxue Xuebao/Acta Acustica
Volume40
Issue number3
StatePublished - 1 May 2015

Fingerprint

Dive into the research topics of 'Voice conversion using bayesian analysis and dynamic kernel features'. Together they form a unique fingerprint.

Cite this