An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity

Dong Yan Huang, Lei Xie, Yvonne Siu Wa Lee, Jie Wu, Huaiping Ming, Xiaohai Tian, Shaofei Zhang, Chuang Ding, Mei Li, Quy Hy Nguyen, Minghui Dong, Eng Siong Chng, Haizhou Li

Research output: Contribution to conferencePaperpeer-review

7 Scopus citations

Abstract

Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Experimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).

Original languageEnglish
Pages44-51
Number of pages8
StatePublished - 2016
Event9th ISCA Speech Synthesis Workshop, SSW 2016 - Sunnyvale, United States
Duration: 13 Sep 201615 Sep 2016

Conference

Conference9th ISCA Speech Synthesis Workshop, SSW 2016
Country/TerritoryUnited States
CitySunnyvale
Period13/09/1615/09/16

Keywords

  • objective measures
  • speaker similarity score
  • speech quality assessment
  • subjective listening tests
  • Voice conversion

Fingerprint

Dive into the research topics of 'An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity'. Together they form a unique fingerprint.

Cite this