Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, Yuanjun Lv, Lei Xie, Yunlin Chen, Hao Yin, Zhifei Li

科研成果: 期刊稿件会议文章同行评审

8 引用 (Scopus)

摘要

The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermore, the encoder is enhanced with 1) contextual modeling with a BLSTM module to exploit the temporal information, 2) a hybrid sampling module to alleviate distortion from upsampling and downsampling, and 3) a resampling module to encourage discrete units to carry more phonetic information. Compared with multi-codebook codecs, e.g., EnCodec and TiCodec, Single-Codec demonstrates higher reconstruction quality with a lower bandwidth of only 304bps. The effectiveness of Single-Code is further validated by LLM-TTS experiments, showing improved naturalness and intelligibility.

源语言英语
页(从-至)3390-3394
页数5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOI
出版状态已出版 - 2024
活动25th Interspeech Conferece 2024 - Kos Island, 希腊
期限: 1 9月 20245 9月 2024

指纹

探究 'Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation' 的科研主题。它们共同构成独一无二的指纹。

引用此