跳到主要导航 跳到搜索 跳到主要内容

Dialospeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching

  • Hanke Xie
  • , Dake Guo
  • , Chengyou Wang
  • , Yue Li
  • , Wenjie Tian
  • , Xinfa Zhu
  • , Xinsheng Wang
  • , Xiulin Li
  • , Guanqiong Miao
  • , Bo Liu
  • , Lei Xie
  • Northwestern Polytechnical University Xian
  • DataBaker (Qingdao) Technology

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Recent advances in text-to-speech (TTS) synthesis, particularly those leveraging large language models (LLMs), have significantly improved expressiveness and naturalness. However, generating human-like, interactive dialogue speech remains challenging. Current systems face limitations due to the scarcity of dual-track data and difficulties in achieving naturalness, contextual coherence, and interactional dynamics, such as turntaking, overlapping speech, and speaker consistency, in multiturn conversations. To address these challenges, we propose DialoSpeech 11Codes and checkpoints will be publicly released., a dual-track architecture combining a large language model with Chunked Flow Matching for expressive, humanlike dialogue speech synthesis. DialoSpeech generates natural multi-turn conversations with coherent speaker turns and natural overlaps, supporting both Chinese and English and crosslingual speech synthesis. We introduce a data processing pipeline to construct dual-track dialogue datasets, facilitating scalable training and experimental validation. Experiments show that our model outperforms baselines, offering a solution for generating human-like spoken dialogues. Audio samples are available at https://tiamojames.github.io/DialoSpeech/

源语言英语
主期刊名2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025
出版商Institute of Electrical and Electronics Engineers Inc.
807-812
页数6
ISBN(电子版)9798331572068
DOI
出版状态已出版 - 2025
活动17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025 - Singapore, 新加坡
期限: 22 10月 202524 10月 2025

出版系列

姓名2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025

会议

会议17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025
国家/地区新加坡
Singapore
时期22/10/2524/10/25

指纹

探究 'Dialospeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching' 的科研主题。它们共同构成独一无二的指纹。

引用此