TY - GEN
T1 - The multi-speaker multi-style voice cloning challenge 2021
AU - Xie, Qicong
AU - Tian, Xiaohai
AU - Liu, Guanghou
AU - Song, Kun
AU - Xie, Lei
AU - Wu, Zhiyong
AU - Li, Hai
AU - Shi, Song
AU - Li, Haizhou
AU - Hong, Fen
AU - Bu, Hui
AU - Xu, Xin
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning task. Specifically, we formulate the challenge to adapt an average TTS model to the stylistic target voice with limited data from target speaker, evaluated by speaker identity and style similarity. The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively. There are also two sub-tracks in each track. For sub-track a, to fairly compare different strategies, the participants are allowed to use only the training data provided by the organizer strictly. For sub-track b, the participants are allowed to use any data publicly available. In this paper, we present a detailed explanation on the tasks and data used in the challenge, followed by a summary of submitted systems and evaluation results.
AB - The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning task. Specifically, we formulate the challenge to adapt an average TTS model to the stylistic target voice with limited data from target speaker, evaluated by speaker identity and style similarity. The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively. There are also two sub-tracks in each track. For sub-track a, to fairly compare different strategies, the participants are allowed to use only the training data provided by the organizer strictly. For sub-track b, the participants are allowed to use any data publicly available. In this paper, we present a detailed explanation on the tasks and data used in the challenge, followed by a summary of submitted systems and evaluation results.
KW - Speaker adaption
KW - Speech synthesis
KW - Transfer learning
KW - Voice cloning
UR - http://www.scopus.com/inward/record.url?scp=85108665338&partnerID=8YFLogxK
U2 - 10.1109/ICASSP39728.2021.9414001
DO - 10.1109/ICASSP39728.2021.9414001
M3 - 会议稿件
AN - SCOPUS:85108665338
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 8613
EP - 8617
BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Y2 - 6 June 2021 through 11 June 2021
ER -