Exemplar-based sparse representation of timbre and prosody for voice conversion

Huaiping Ming, Dongyan Huang, Lei Xie, Shaofei Zhang, Minghui Dong, Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

35 Scopus citations

Abstract

Voice conversion (VC) aims to make one speaker (source) to sound like spoken by another speaker (target) without changing the language content. Most of the state-of-the-art voice conversion systems focus only on timbre conversion. However, the speaker identity is characterized by the source-related cues such as fundamental frequency and energy as well. In this work, we propose an exemplarbased sparse representation of timbre and prosody for voice conversion that does not necessitate separately timbre conversion and prosody conversions. The experiment results show that, in addition to the conversion of spectral features, the proper conversion of prosody features will improve the quality and speaker identity of the converted speech.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5175-5179
Number of pages5
ISBN (Electronic)9781479999880
DOIs
StatePublished - 18 May 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 20 Mar 201625 Mar 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Conference

Conference41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Country/TerritoryChina
CityShanghai
Period20/03/1625/03/16

Keywords

  • exemplar
  • prosody
  • sparse representation
  • timbre
  • Voice conversion

Fingerprint

Dive into the research topics of 'Exemplar-based sparse representation of timbre and prosody for voice conversion'. Together they form a unique fingerprint.

Cite this