We are pleased to publish KiSing, the first open-source Mandarin singing corpus built specifically for singing voice synthesis (SVS).
Corpus Specifics
This corpus consists of singing voices and their corresponding musical and phonetic annotation. The specification is as follows.
- 14 songs from Keyi (Kiki) Zhang (composer, lyricist, singer)
- High quality (recorded in a professional recording studio) and high sampling rate (48 kHz)
- Free for non-commercial use (See “terms of use”)
- Other useful data (MIDI, phoneme labels with specific duration information)
Download
Segmented singing, midi, and phonetic label
The Singer
Keyi (Kiki) Zhang, 张钶浥, is a talented Chinese female singer, composer, and lyricist. She has published around 30 songs with a variety of styles. The KiSing corpus, named after her name Kiki, mainly consists of some of his published songs. Those songs with accompaniments can be found in both QQ music and Netease Cloud Music. Feel free to check them out!
Term of Use
All the data in the corpus is licensed with Creative Commons Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0).
Main Contributors
Jiatong Shi, The Johns Hopkins University, jiatong_shi@jhu.edu
Keyi (Kiki) Zhang, the singer, composer, and lyricist
Zhaodong Yao, the writer for the music score (i.e., MIDI) annotation
Other Resources
The corresponding recipe to train a singing voice synthesis system will be released soon in Muskits
Citation
Shi, J., Guo, S., Qian, T., Hayashi, T., Wu, Y., Xu, F., Chang, X., Li, H., Wu, P., Watanabe, S., Jin, Q. (2022) Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis. Proc. Interspeech 2022, 4277-4281, doi: 10.21437/Interspeech.2022-10039 @inproceedings{shi22d_interspeech, author={Jiatong Shi and Shuai Guo and Tao Qian and Tomoki Hayashi and Yuning Wu and Fangzheng Xu and Xuankai Chang and Huazhe Li and Peter Wu and Shinji Watanabe and Qin Jin}, title={{Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis}}, year=2022, booktitle={Proc. Interspeech 2022}, pages={4277--4281}, doi={10.21437/Interspeech.2022-10039}, issn={2958-1796} }
Acknowledgment
The project is under the support of the AIM3 lab from Renmin University of China. We would also like to thank Bingrong Shi and Yunhong Wei for correcting the phonetic alignment.
great website IT Telkom
您好,学长!
感谢开源数据。
我长在写SVS相关的论文。
有个问题请教一下:
midi 与wav的时长为什么不一样呢?
Hello, 我们公布的数据里面,wav是segmented之后的数据,而midi是所有wav cat到一起的数据,所以对应起来有一些偏差,我们应该近期会做一个update关于如何具体处理数据,应该会公布到我们的工具Muskits里面https://github.com/SJTMusicTeam/Muskits。另外需要提醒的是,我们发现我们目前公布的数据里面phone并不是很balance,所以导致svs学习不是那么好,可能需要一些更general的phone表示。如果只是做SVS相关研究的话,建议先用日文的几个数据。我们这个数据之后会再做一些进一步的V2来弥补这个问题,但目前还没有确定具体的发布时间