An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity

Dong Yan Huang; Lei Xie; Yvonne Siu Wa Lee; Jie Wu; Huaiping Ming; Xiaohai Tian; Shaofei Zhang; Chuang Ding; Mei Li; Quy Hy Nguyen; Minghui Dong; Eng Siong Chng; Haizhou Li

An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity

Dong Yan Huang, Lei Xie, Yvonne Siu Wa Lee, Jie Wu, Huaiping Ming, Xiaohai Tian, Shaofei Zhang, Chuang Ding, Mei Li, Quy Hy Nguyen, Minghui Dong, Eng Siong Chng, Haizhou Li

School of Computer Science

Research output: Contribution to conference › Paper › peer-review

7 Scopus citations

Abstract

Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Experimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).

Original language	English
Pages	44-51
Number of pages	8
State	Published - 2016
Event	9th ISCA Speech Synthesis Workshop, SSW 2016 - Sunnyvale, United States Duration: 13 Sep 2016 → 15 Sep 2016

Conference

Conference	9th ISCA Speech Synthesis Workshop, SSW 2016
Country/Territory	United States
City	Sunnyvale
Period	13/09/16 → 15/09/16

Keywords

objective measures
speaker similarity score
speech quality assessment
subjective listening tests
Voice conversion

Cite this

@conference{4feffad504da455f9443e8d4d3119dfc,

title = "An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity",

abstract = "Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Experimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).",

keywords = "objective measures, speaker similarity score, speech quality assessment, subjective listening tests, Voice conversion",

author = "Huang, {Dong Yan} and Lei Xie and Lee, {Yvonne Siu Wa} and Jie Wu and Huaiping Ming and Xiaohai Tian and Shaofei Zhang and Chuang Ding and Mei Li and Nguyen, {Quy Hy} and Minghui Dong and Chng, {Eng Siong} and Haizhou Li",

note = "Publisher Copyright: {\textcopyright} 2016, 9th ISCA Speech Synthesis Workshop, SSW 2016. All rights reserved.; 9th ISCA Speech Synthesis Workshop, SSW 2016 ; Conference date: 13-09-2016 Through 15-09-2016",

year = "2016",

language = "英语",

pages = "44--51",

}

TY - CONF

T1 - An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity

AU - Huang, Dong Yan

AU - Xie, Lei

AU - Lee, Yvonne Siu Wa

AU - Wu, Jie

AU - Ming, Huaiping

AU - Tian, Xiaohai

AU - Zhang, Shaofei

AU - Ding, Chuang

AU - Li, Mei

AU - Nguyen, Quy Hy

AU - Dong, Minghui

AU - Chng, Eng Siong

AU - Li, Haizhou

PY - 2016

Y1 - 2016

N2 - Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Experimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).

AB - Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Experimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).

KW - objective measures

KW - speaker similarity score

KW - speech quality assessment

KW - subjective listening tests

KW - Voice conversion

UR - http://www.scopus.com/inward/record.url?scp=85075288991&partnerID=8YFLogxK

M3 - 论文

AN - SCOPUS:85075288991

SP - 44

EP - 51

T2 - 9th ISCA Speech Synthesis Workshop, SSW 2016

Y2 - 13 September 2016 through 15 September 2016

ER -

An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity

Abstract

Conference

Keywords

Other files and links

Fingerprint

Cite this