Exemplar-based sparse representation of timbre and prosody for voice conversion

Huaiping Ming; Dongyan Huang; Lei Xie; Shaofei Zhang; Minghui Dong; Haizhou Li

doi:10.1109/ICASSP.2016.7472664

Exemplar-based sparse representation of timbre and prosody for voice conversion

Huaiping Ming, Dongyan Huang, Lei Xie, Shaofei Zhang, Minghui Dong, Haizhou Li

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

35 Scopus citations

Abstract

Voice conversion (VC) aims to make one speaker (source) to sound like spoken by another speaker (target) without changing the language content. Most of the state-of-the-art voice conversion systems focus only on timbre conversion. However, the speaker identity is characterized by the source-related cues such as fundamental frequency and energy as well. In this work, we propose an exemplarbased sparse representation of timbre and prosody for voice conversion that does not necessitate separately timbre conversion and prosody conversions. The experiment results show that, in addition to the conversion of spectral features, the proper conversion of prosody features will improve the quality and speaker identity of the converted speech.

Original language	English
Title of host publication	2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	5175-5179
Number of pages	5
ISBN (Electronic)	9781479999880
DOIs	https://doi.org/10.1109/ICASSP.2016.7472664
State	Published - 18 May 2016
Event	41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China Duration: 20 Mar 2016 → 25 Mar 2016

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2016-May
ISSN (Print)	1520-6149

Conference

Conference	41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Country/Territory	China
City	Shanghai
Period	20/03/16 → 25/03/16

Keywords

exemplar
prosody
sparse representation
timbre
Voice conversion

Access to Document

10.1109/ICASSP.2016.7472664

Cite this

Ming, H., Huang, D., Xie, L., Zhang, S., Dong, M., & Li, H. (2016). Exemplar-based sparse representation of timbre and prosody for voice conversion. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings (pp. 5175-5179). Article 7472664 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2016-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2016.7472664

Ming, Huaiping ; Huang, Dongyan ; Xie, Lei et al. / Exemplar-based sparse representation of timbre and prosody for voice conversion. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 5175-5179 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{f6b207abd65e4573a2f1bc86736136b6,

title = "Exemplar-based sparse representation of timbre and prosody for voice conversion",

abstract = "Voice conversion (VC) aims to make one speaker (source) to sound like spoken by another speaker (target) without changing the language content. Most of the state-of-the-art voice conversion systems focus only on timbre conversion. However, the speaker identity is characterized by the source-related cues such as fundamental frequency and energy as well. In this work, we propose an exemplarbased sparse representation of timbre and prosody for voice conversion that does not necessitate separately timbre conversion and prosody conversions. The experiment results show that, in addition to the conversion of spectral features, the proper conversion of prosody features will improve the quality and speaker identity of the converted speech.",

keywords = "exemplar, prosody, sparse representation, timbre, Voice conversion",

author = "Huaiping Ming and Dongyan Huang and Lei Xie and Shaofei Zhang and Minghui Dong and Haizhou Li",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.; 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 ; Conference date: 20-03-2016 Through 25-03-2016",

year = "2016",

month = may,

day = "18",

doi = "10.1109/ICASSP.2016.7472664",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "5175--5179",

booktitle = "2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings",

}

Ming, H, Huang, D, Xie, L, Zhang, S, Dong, M & Li, H 2016, Exemplar-based sparse representation of timbre and prosody for voice conversion. in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings., 7472664, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2016-May, Institute of Electrical and Electronics Engineers Inc., pp. 5175-5179, 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 20/03/16. https://doi.org/10.1109/ICASSP.2016.7472664

Exemplar-based sparse representation of timbre and prosody for voice conversion. / Ming, Huaiping; Huang, Dongyan; Xie, Lei et al.
2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. p. 5175-5179 7472664 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2016-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Exemplar-based sparse representation of timbre and prosody for voice conversion

AU - Ming, Huaiping

AU - Huang, Dongyan

AU - Xie, Lei

AU - Zhang, Shaofei

AU - Dong, Minghui

AU - Li, Haizhou

PY - 2016/5/18

Y1 - 2016/5/18

N2 - Voice conversion (VC) aims to make one speaker (source) to sound like spoken by another speaker (target) without changing the language content. Most of the state-of-the-art voice conversion systems focus only on timbre conversion. However, the speaker identity is characterized by the source-related cues such as fundamental frequency and energy as well. In this work, we propose an exemplarbased sparse representation of timbre and prosody for voice conversion that does not necessitate separately timbre conversion and prosody conversions. The experiment results show that, in addition to the conversion of spectral features, the proper conversion of prosody features will improve the quality and speaker identity of the converted speech.

AB - Voice conversion (VC) aims to make one speaker (source) to sound like spoken by another speaker (target) without changing the language content. Most of the state-of-the-art voice conversion systems focus only on timbre conversion. However, the speaker identity is characterized by the source-related cues such as fundamental frequency and energy as well. In this work, we propose an exemplarbased sparse representation of timbre and prosody for voice conversion that does not necessitate separately timbre conversion and prosody conversions. The experiment results show that, in addition to the conversion of spectral features, the proper conversion of prosody features will improve the quality and speaker identity of the converted speech.

KW - exemplar

KW - prosody

KW - sparse representation

KW - timbre

KW - Voice conversion

UR - http://www.scopus.com/inward/record.url?scp=84973338474&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2016.7472664

DO - 10.1109/ICASSP.2016.7472664

M3 - 会议稿件

AN - SCOPUS:84973338474

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 5175

EP - 5179

BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016

Y2 - 20 March 2016 through 25 March 2016

ER -

Ming H, Huang D, Xie L, Zhang S, Dong M, Li H. Exemplar-based sparse representation of timbre and prosody for voice conversion. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2016. p. 5175-5179. 7472664. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2016.7472664

Exemplar-based sparse representation of timbre and prosody for voice conversion

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this