Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis

Yu Wang; Xinsheng Wang; Pengcheng Zhu; Jie Wu; Hanzhao Li; Heyang Xue; Yongmao Zhang; Lei Xie; Mengxiao Bi

doi:10.21437/Interspeech.2022-48

Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis

Yu Wang, Xinsheng Wang, Pengcheng Zhu, Jie Wu, Hanzhao Li, Heyang Xue, Yongmao Zhang, Lei Xie, Mengxiao Bi

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

45 Scopus citations

Abstract

This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin songs performed by a female professional singer. Audio files are recorded with studio quality at a sampling rate of 44, 100 Hz and the corresponding lyrics and musical scores are provided. All singing recordings have been phonetically annotated with phoneme boundaries and syllable (note) boundaries. To demonstrate the reliability of the released data and to provide a baseline for future research, we built baseline deep neural network-based SVS models and evaluated them with both objective metrics and subjective mean opinion score (MOS) measure. Experimental results show that the best SVS model trained on our database achieves 3.70 MOS, indicating the reliability of the provided corpus. Opencpop is released to the open-source community WeNet, and the corpus, as well as synthesized demos, can be found on the project homepage.

Original language	English
Pages (from-to)	4242-4246
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2022-September
DOIs	https://doi.org/10.21437/Interspeech.2022-48
State	Published - 2022
Event	23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sep 2022 → 22 Sep 2022

Keywords

Singing voice synthesis
benchmark
corpus
open source
text-to-speech

Access to Document

10.21437/Interspeech.2022-48

Cite this

@article{a48706732764427389e832ae56529c6f,

title = "Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis",

abstract = "This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin songs performed by a female professional singer. Audio files are recorded with studio quality at a sampling rate of 44, 100 Hz and the corresponding lyrics and musical scores are provided. All singing recordings have been phonetically annotated with phoneme boundaries and syllable (note) boundaries. To demonstrate the reliability of the released data and to provide a baseline for future research, we built baseline deep neural network-based SVS models and evaluated them with both objective metrics and subjective mean opinion score (MOS) measure. Experimental results show that the best SVS model trained on our database achieves 3.70 MOS, indicating the reliability of the provided corpus. Opencpop is released to the open-source community WeNet, and the corpus, as well as synthesized demos, can be found on the project homepage.",

keywords = "Singing voice synthesis, benchmark, corpus, open source, text-to-speech",

author = "Yu Wang and Xinsheng Wang and Pengcheng Zhu and Jie Wu and Hanzhao Li and Heyang Xue and Yongmao Zhang and Lei Xie and Mengxiao Bi",

note = "Publisher Copyright: Copyright {\textcopyright} 2022 ISCA.; 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

year = "2022",

doi = "10.21437/Interspeech.2022-48",

language = "英语",

volume = "2022-September",

pages = "4242--4246",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis. / Wang, Yu; Wang, Xinsheng; Zhu, Pengcheng et al.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2022-September, 2022, p. 4242-4246.

Research output: Contribution to journal › Conference article › peer-review

TY - JOUR

T1 - Opencpop

T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022

AU - Wang, Yu

AU - Wang, Xinsheng

AU - Zhu, Pengcheng

AU - Wu, Jie

AU - Li, Hanzhao

AU - Xue, Heyang

AU - Zhang, Yongmao

AU - Xie, Lei

AU - Bi, Mengxiao

PY - 2022

Y1 - 2022

N2 - This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin songs performed by a female professional singer. Audio files are recorded with studio quality at a sampling rate of 44, 100 Hz and the corresponding lyrics and musical scores are provided. All singing recordings have been phonetically annotated with phoneme boundaries and syllable (note) boundaries. To demonstrate the reliability of the released data and to provide a baseline for future research, we built baseline deep neural network-based SVS models and evaluated them with both objective metrics and subjective mean opinion score (MOS) measure. Experimental results show that the best SVS model trained on our database achieves 3.70 MOS, indicating the reliability of the provided corpus. Opencpop is released to the open-source community WeNet, and the corpus, as well as synthesized demos, can be found on the project homepage.

AB - This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin songs performed by a female professional singer. Audio files are recorded with studio quality at a sampling rate of 44, 100 Hz and the corresponding lyrics and musical scores are provided. All singing recordings have been phonetically annotated with phoneme boundaries and syllable (note) boundaries. To demonstrate the reliability of the released data and to provide a baseline for future research, we built baseline deep neural network-based SVS models and evaluated them with both objective metrics and subjective mean opinion score (MOS) measure. Experimental results show that the best SVS model trained on our database achieves 3.70 MOS, indicating the reliability of the provided corpus. Opencpop is released to the open-source community WeNet, and the corpus, as well as synthesized demos, can be found on the project homepage.

KW - Singing voice synthesis

KW - benchmark

KW - corpus

KW - open source

KW - text-to-speech

UR - http://www.scopus.com/inward/record.url?scp=85140065125&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2022-48

DO - 10.21437/Interspeech.2022-48

M3 - 会议文章

AN - SCOPUS:85140065125

SN - 2308-457X

VL - 2022-September

SP - 4242

EP - 4246

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Y2 - 18 September 2022 through 22 September 2022

ER -

Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this