On the use of I-vectors and average voice model for voice conversion without parallel data

Jie Wu, Zhizheng Wu, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

16 引用 (Scopus)

摘要

Recently, deep and/or recurrent neural networks (DNNs/RNNs) have been employed for voice conversion, and have significantly improved the performance of converted speech. However, DNNs/RNNs generally require a large amount of parallel training data (e.g., hundreds of utterances) from source and target speakers. It is expensive to collect such a large amount of data, and impossible in some applications, such as cross-lingual conversion. To solve this problem, we propose to use average voice model and i-vectors for long short-term memory (LSTM) based voice conversion, which does not require parallel data from source and target speakers. The average voice model is trained using other speakers' data, and the i-vectors, a compact vector representing the identities of source and target speakers, are extracted independently. Subjective evaluation has confirmed the effectiveness of the proposed approach.

源语言英语
主期刊名2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9789881476821
DOI
出版状态已出版 - 17 1月 2017
活动2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 - Jeju, 韩国
期限: 13 12月 201616 12月 2016

出版系列

姓名2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

会议

会议2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
国家/地区韩国
Jeju
时期13/12/1616/12/16

指纹

探究 'On the use of I-vectors and average voice model for voice conversion without parallel data' 的科研主题。它们共同构成独一无二的指纹。

引用此