Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Kun Song, Jian Cong, Xinsheng Wang, Yongmao Zhang, Lei Xie, Ning Jiang, Haiying Wu

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

In current two-stage neural text-to-speech (TTS) paradigm, it is ideal to have a universal neural vocoder, once trained, which is robust to imperfect mel-spectrogram predicted from the acoustic model. To this end, we propose Robust MelGAN vocoder by solving the original multi-band MelGAN's metallic sound problem and increasing its generalization ability. Specifically, we introduce a fine-grained network dropout strategy to the generator. With a specifically designed over-smooth handler which separates speech signal intro periodic and aperiodic components, we only perform network dropout to the aperodic components, which alleviates metallic sounding and maintains good speaker similarity. To further improve generalization ability, we introduce several data augmentation methods to augment fake data in the discriminator, including harmonic shift, harmonic noise and phase noise. Experiments show that Robust MelGAN can be used as a universal vocoder, significantly improving sound quality in TTS systems built on various types of data. 11Audio samples are available at https://RobustMelGAN.github.io/RobustMelGAN/

源语言英语
主期刊名2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
编辑Kong Aik Lee, Hung-yi Lee, Yanfeng Lu, Minghui Dong
出版商Institute of Electrical and Electronics Engineers Inc.
71-75
页数5
ISBN(电子版)9798350397963
DOI
出版状态已出版 - 2022
活动13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 - Singapore, 新加坡
期限: 11 12月 202214 12月 2022

出版系列

姓名2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

会议

会议13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
国家/地区新加坡
Singapore
时期11/12/2214/12/22

指纹

探究 'Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS' 的科研主题。它们共同构成独一无二的指纹。

引用此