TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN

  • Yi Chen
  • , Shan Yang
  • , Na Hu
  • , Lei Xie
  • , Dan Su

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Speech coding aims at compressing digital speech signals with fewer bits and reconstructing it back to raw signals, maintaining the speech quality as much as possible. But conventional codecs usually need a high bit-rate to achieve reconstructed speech with reasonable high quality. In this paper, we propose an end-to-end neural generative codec with a VQ-VAE based auto-encoder and the generative adversarial network (GAN), which achieves reconstructed speech with high-fidelity at a low bit-rate about 2 kb/s. The compression process of speech coding is carried out by a down-sampling module of the encoder and a learnable discrete codebook. GAN is used to further improve the reconstructed quality. Our experiments confirm the effectiveness of the proposed model in both objective and subjective tests, which significantly outperforms the conventional codecs at low bit-rate in terms of speech quality and speaker similarity.

Original languageEnglish
Title of host publicationICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction
PublisherAssociation for Computing Machinery, Inc
Pages126-130
Number of pages5
ISBN (Electronic)9781450384711
DOIs
StatePublished - 18 Oct 2021
Externally publishedYes
Event23rd ACM International Conference on Multimodal Interaction, ICMI 2021 - Virtual, Online, Canada
Duration: 18 Oct 202122 Oct 2021

Publication series

NameICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal Interaction

Conference

Conference23rd ACM International Conference on Multimodal Interaction, ICMI 2021
Country/TerritoryCanada
CityVirtual, Online
Period18/10/2122/10/21

Keywords

  • Codec
  • GAN
  • VQ-VAE
  • low bit-rate
  • neural speech coding

Fingerprint

Dive into the research topics of 'TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN'. Together they form a unique fingerprint.

Cite this