跳到主要导航 跳到搜索 跳到主要内容

TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN

  • Yi Chen
  • , Shan Yang
  • , Na Hu
  • , Lei Xie
  • , Dan Su
  • Northwestern Polytechnical University Xian
  • Tencent

科研成果: 书/报告/会议事项章节会议稿件同行评审

5 引用 (Scopus)

摘要

Speech coding aims at compressing digital speech signals with fewer bits and reconstructing it back to raw signals, maintaining the speech quality as much as possible. But conventional codecs usually need a high bit-rate to achieve reconstructed speech with reasonable high quality. In this paper, we propose an end-to-end neural generative codec with a VQ-VAE based auto-encoder and the generative adversarial network (GAN), which achieves reconstructed speech with high-fidelity at a low bit-rate about 2 kb/s. The compression process of speech coding is carried out by a down-sampling module of the encoder and a learnable discrete codebook. GAN is used to further improve the reconstructed quality. Our experiments confirm the effectiveness of the proposed model in both objective and subjective tests, which significantly outperforms the conventional codecs at low bit-rate in terms of speech quality and speaker similarity.

源语言英语
主期刊名ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction
出版商Association for Computing Machinery, Inc
126-130
页数5
ISBN(电子版)9781450384711
DOI
出版状态已出版 - 18 10月 2021
已对外发布
活动23rd ACM International Conference on Multimodal Interaction, ICMI 2021 - Virtual, Online, 加拿大
期限: 18 10月 202122 10月 2021

出版系列

姓名ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction

会议

会议23rd ACM International Conference on Multimodal Interaction, ICMI 2021
国家/地区加拿大
Virtual, Online
时期18/10/2122/10/21

指纹

探究 'TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN' 的科研主题。它们共同构成独一无二的指纹。

引用此