TY - GEN
T1 - Promptspeaker
T2 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
AU - Zhang, Yongmao
AU - Liu, Guanghou
AU - Lei, Yi
AU - Chen, Yunlin
AU - Yin, Hao
AU - Xie, Lei
AU - Li, Zhifei
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Recently, text-guided content generation has received extensive attention. In this work, we explore the possibility of text description-based speaker generation, i.e., using text prompts to control the speaker generation process. Specifically, we propose PromptSpeaker, a text-guided speaker generation system. PromptSpeaker consists of a prompt encoder, a zero-shot VITS, and a Glow model, where the prompt encoder predicts a prior distribution based on the text description and samples from this distribution to obtain a semantic representation. The Glow model subsequently converts the semantic representation into a speaker representation, and the zero-shot VITS finally synthesizes the speaker's voice based on the speaker representation. We verify that PromptSpeaker can generate speakers new from the training set by objective metrics, and the synthetic speaker voice has reasonable subjective matching quality with the speaker prompt. Our audio samples are available on the demo website11Demo: https://promptspeaker.github.io/demo/
AB - Recently, text-guided content generation has received extensive attention. In this work, we explore the possibility of text description-based speaker generation, i.e., using text prompts to control the speaker generation process. Specifically, we propose PromptSpeaker, a text-guided speaker generation system. PromptSpeaker consists of a prompt encoder, a zero-shot VITS, and a Glow model, where the prompt encoder predicts a prior distribution based on the text description and samples from this distribution to obtain a semantic representation. The Glow model subsequently converts the semantic representation into a speaker representation, and the zero-shot VITS finally synthesizes the speaker's voice based on the speaker representation. We verify that PromptSpeaker can generate speakers new from the training set by objective metrics, and the synthetic speaker voice has reasonable subjective matching quality with the speaker prompt. Our audio samples are available on the demo website11Demo: https://promptspeaker.github.io/demo/
KW - Prompt
KW - Speaker Generation
KW - Text-to-Speech
UR - http://www.scopus.com/inward/record.url?scp=85184660751&partnerID=8YFLogxK
U2 - 10.1109/ASRU57964.2023.10389772
DO - 10.1109/ASRU57964.2023.10389772
M3 - 会议稿件
AN - SCOPUS:85184660751
T3 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
BT - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 December 2023 through 20 December 2023
ER -