Effective Wavenet Adaptation for Voice Conversion with Limited Data

Hongqiang Du; Xiaohai Tian; Lei Xie; Haizhou Li

doi:10.1109/ICASSP40776.2020.9053315

Effective Wavenet Adaptation for Voice Conversion with Limited Data

Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

4 Scopus citations

Abstract

WaveNet has shown its great potential as a direct conversion model in voice conversion. However, due to the model complexity, WaveNet always requires a large amount of training data, which has limited its applications in voice conversion, where training data is scarce. In this paper, we propose a WaveNet adaptation method that effectively reduces the need of adaptation data. We first train a speaker independent WaveNet conversion model with multi-speaker dataset. Adaptation is then applied with limited target speaker's data. Specifically, singular value decomposition (SVD) is applied to dilated convolution layers of WaveNet to reduce the number of parameters, which makes adaptation more effective with limited data. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus show that the proposed method outperforms baseline methods in terms of both quality and similarity.

Original language	English
Title of host publication	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	7779-7783
Number of pages	5
ISBN (Electronic)	9781509066315
DOIs	https://doi.org/10.1109/ICASSP40776.2020.9053315
State	Published - May 2020
Event	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain Duration: 4 May 2020 → 8 May 2020

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2020-May
ISSN (Print)	1520-6149

Conference

Conference	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Country/Territory	Spain
City	Barcelona
Period	4/05/20 → 8/05/20

Keywords

Singular Value Decomposition (SVD)
Voice Conversion (VC)
WaveNet adaptation

Access to Document

10.1109/ICASSP40776.2020.9053315

Cite this

Du, H., Tian, X., Xie, L., & Li, H. (2020). Effective Wavenet Adaptation for Voice Conversion with Limited Data. In 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings (pp. 7779-7783). Article 9053315 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2020-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP40776.2020.9053315

Du, Hongqiang ; Tian, Xiaohai ; Xie, Lei et al. / Effective Wavenet Adaptation for Voice Conversion with Limited Data. 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. pp. 7779-7783 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{3a318ba1b3a24180a1a632d87d842fe0,

title = "Effective Wavenet Adaptation for Voice Conversion with Limited Data",

abstract = "WaveNet has shown its great potential as a direct conversion model in voice conversion. However, due to the model complexity, WaveNet always requires a large amount of training data, which has limited its applications in voice conversion, where training data is scarce. In this paper, we propose a WaveNet adaptation method that effectively reduces the need of adaptation data. We first train a speaker independent WaveNet conversion model with multi-speaker dataset. Adaptation is then applied with limited target speaker's data. Specifically, singular value decomposition (SVD) is applied to dilated convolution layers of WaveNet to reduce the number of parameters, which makes adaptation more effective with limited data. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus show that the proposed method outperforms baseline methods in terms of both quality and similarity.",

keywords = "Singular Value Decomposition (SVD), Voice Conversion (VC), WaveNet adaptation",

author = "Hongqiang Du and Xiaohai Tian and Lei Xie and Haizhou Li",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020",

year = "2020",

month = may,

doi = "10.1109/ICASSP40776.2020.9053315",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "7779--7783",

booktitle = "2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings",

}

Du, H, Tian, X, Xie, L & Li, H 2020, Effective Wavenet Adaptation for Voice Conversion with Limited Data. in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings., 9053315, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2020-May, Institute of Electrical and Electronics Engineers Inc., pp. 7779-7783, 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, Barcelona, Spain, 4/05/20. https://doi.org/10.1109/ICASSP40776.2020.9053315

Effective Wavenet Adaptation for Voice Conversion with Limited Data. / Du, Hongqiang; Tian, Xiaohai; Xie, Lei et al.
2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. p. 7779-7783 9053315 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2020-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Effective Wavenet Adaptation for Voice Conversion with Limited Data

AU - Du, Hongqiang

AU - Tian, Xiaohai

AU - Xie, Lei

AU - Li, Haizhou

PY - 2020/5

Y1 - 2020/5

N2 - WaveNet has shown its great potential as a direct conversion model in voice conversion. However, due to the model complexity, WaveNet always requires a large amount of training data, which has limited its applications in voice conversion, where training data is scarce. In this paper, we propose a WaveNet adaptation method that effectively reduces the need of adaptation data. We first train a speaker independent WaveNet conversion model with multi-speaker dataset. Adaptation is then applied with limited target speaker's data. Specifically, singular value decomposition (SVD) is applied to dilated convolution layers of WaveNet to reduce the number of parameters, which makes adaptation more effective with limited data. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus show that the proposed method outperforms baseline methods in terms of both quality and similarity.

AB - WaveNet has shown its great potential as a direct conversion model in voice conversion. However, due to the model complexity, WaveNet always requires a large amount of training data, which has limited its applications in voice conversion, where training data is scarce. In this paper, we propose a WaveNet adaptation method that effectively reduces the need of adaptation data. We first train a speaker independent WaveNet conversion model with multi-speaker dataset. Adaptation is then applied with limited target speaker's data. Specifically, singular value decomposition (SVD) is applied to dilated convolution layers of WaveNet to reduce the number of parameters, which makes adaptation more effective with limited data. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus show that the proposed method outperforms baseline methods in terms of both quality and similarity.

KW - Singular Value Decomposition (SVD)

KW - Voice Conversion (VC)

KW - WaveNet adaptation

UR - http://www.scopus.com/inward/record.url?scp=85089242967&partnerID=8YFLogxK

U2 - 10.1109/ICASSP40776.2020.9053315

DO - 10.1109/ICASSP40776.2020.9053315

M3 - 会议稿件

AN - SCOPUS:85089242967

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 7779

EP - 7783

BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020

Y2 - 4 May 2020 through 8 May 2020

ER -

Du H, Tian X, Xie L, Li H. Effective Wavenet Adaptation for Voice Conversion with Limited Data. In 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2020. p. 7779-7783. 9053315. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP40776.2020.9053315

Effective Wavenet Adaptation for Voice Conversion with Limited Data

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this