Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation

Ziqian Ning, Shuai Wang, Yuepeng Jiang, Jixun Yao, Lei He, Shifeng Pan, Jie Ding, Lei Xie

科研成果: 期刊稿件会议文章同行评审

摘要

Rap, a prominent genre of vocal performance, remains underexplored in vocal generation. General vocal synthesis depends on precise note and duration inputs, requiring users to have related musical knowledge, which limits flexibility. In contrast, rap typically features simpler melodies, with a core focus on a strong rhythmic sense that harmonizes with accompanying beats. In this paper, we propose Freestyler, the first system that generates rapping vocals directly from lyrics and accompaniment inputs. Freestyler utilizes language model-based token generation, followed by a conditional flow matching model to produce spectrograms and a neural vocoder to restore audio. It allows a 3-second prompt to enable zero-shot timbre control. Due to the scarcity of publicly available rap datasets, we also present RapBank, a rap song dataset collected from the internet, alongside a meticulously designed processing pipeline. Experimental results show that Freestyler produces high-quality rapping voice generation with enhanced naturalness and strong alignment with accompanying beats, both stylistically and rhythmically.

源语言英语
页(从-至)24966-24974
页数9
期刊Proceedings of the AAAI Conference on Artificial Intelligence
39
23
DOI
出版状态已出版 - 11 4月 2025
活动39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, 美国
期限: 25 2月 20254 3月 2025

指纹

探究 'Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation' 的科研主题。它们共同构成独一无二的指纹。

引用此