Voice conversion using bayesian analysis and dynamic kernel features

Na Li; Xiangyang Zeng; Yu Qiao; Zhifeng Li

Voice conversion using bayesian analysis and dynamic kernel features

Na Li, Xiangyang Zeng, Yu Qiao, Zhifeng Li

航海学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

When the training utterances are sparse, the voice conversion method based on Mixture of Probabilistic Linear Regressions is subjected to overfitting problem. To address that case, we adopt dynamic kernel features to replace the cepstrum features of the original speaker and estimate the transformation parameters in sense of Maximizing a Posterior with Bayesian inference. First, the features of the original speaker are converted into dynamic kernel features by kernel transformation. Then the prior information of the transformation parameters is introduced. Finally, according to different assumptions about conversion error, we propose two different methods to estimate the transformation parameters. Compared to MPLR, the proposed method achieves 4.25% relative decrease on the average cepstrum distortion in objective evaluations and obtains higher score about naturalness and similarity in subjective evaluations. Experimental results indicate that the proposed method can alleviate the overfitting problem.

源语言	英语
页（从-至）	455-461
页数	7
期刊	Shengxue Xuebao/Acta Acustica
卷	40
期	3
出版状态	已出版 - 1 5月 2015

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{c0325762baae4f338f1d1b9eb8535220,

title = "Voice conversion using bayesian analysis and dynamic kernel features",

abstract = "When the training utterances are sparse, the voice conversion method based on Mixture of Probabilistic Linear Regressions is subjected to overfitting problem. To address that case, we adopt dynamic kernel features to replace the cepstrum features of the original speaker and estimate the transformation parameters in sense of Maximizing a Posterior with Bayesian inference. First, the features of the original speaker are converted into dynamic kernel features by kernel transformation. Then the prior information of the transformation parameters is introduced. Finally, according to different assumptions about conversion error, we propose two different methods to estimate the transformation parameters. Compared to MPLR, the proposed method achieves 4.25% relative decrease on the average cepstrum distortion in objective evaluations and obtains higher score about naturalness and similarity in subjective evaluations. Experimental results indicate that the proposed method can alleviate the overfitting problem.",

author = "Na Li and Xiangyang Zeng and Yu Qiao and Zhifeng Li",

year = "2015",

month = may,

day = "1",

language = "英语",

volume = "40",

pages = "455--461",

journal = "Shengxue Xuebao/Acta Acustica",

issn = "0371-0025",

publisher = "Science Press ",

number = "3",

}

TY - JOUR

T1 - Voice conversion using bayesian analysis and dynamic kernel features

AU - Li, Na

AU - Zeng, Xiangyang

AU - Qiao, Yu

AU - Li, Zhifeng

PY - 2015/5/1

Y1 - 2015/5/1

N2 - When the training utterances are sparse, the voice conversion method based on Mixture of Probabilistic Linear Regressions is subjected to overfitting problem. To address that case, we adopt dynamic kernel features to replace the cepstrum features of the original speaker and estimate the transformation parameters in sense of Maximizing a Posterior with Bayesian inference. First, the features of the original speaker are converted into dynamic kernel features by kernel transformation. Then the prior information of the transformation parameters is introduced. Finally, according to different assumptions about conversion error, we propose two different methods to estimate the transformation parameters. Compared to MPLR, the proposed method achieves 4.25% relative decrease on the average cepstrum distortion in objective evaluations and obtains higher score about naturalness and similarity in subjective evaluations. Experimental results indicate that the proposed method can alleviate the overfitting problem.

AB - When the training utterances are sparse, the voice conversion method based on Mixture of Probabilistic Linear Regressions is subjected to overfitting problem. To address that case, we adopt dynamic kernel features to replace the cepstrum features of the original speaker and estimate the transformation parameters in sense of Maximizing a Posterior with Bayesian inference. First, the features of the original speaker are converted into dynamic kernel features by kernel transformation. Then the prior information of the transformation parameters is introduced. Finally, according to different assumptions about conversion error, we propose two different methods to estimate the transformation parameters. Compared to MPLR, the proposed method achieves 4.25% relative decrease on the average cepstrum distortion in objective evaluations and obtains higher score about naturalness and similarity in subjective evaluations. Experimental results indicate that the proposed method can alleviate the overfitting problem.

UR - http://www.scopus.com/inward/record.url?scp=84930069934&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:84930069934

SN - 0371-0025

VL - 40

SP - 455

EP - 461

JO - Shengxue Xuebao/Acta Acustica

JF - Shengxue Xuebao/Acta Acustica

IS - 3

ER -

Voice conversion using bayesian analysis and dynamic kernel features

摘要

其它文件与链接

指纹

引用此