shijt – Jiatong Shi's Homepage

2022-07-01

ESPnet tutorial at JSALT2022

I held a tutorial session at JSALT2022 summer school with Leo Yang. During the workshop, Leo introduced S3PRL and I introduced ESPnet with several of our recent updates. So glad to know that many students are planning to use ESPnet after the tutorial~

The slides can be found at https://github.com/ftshijt/PublicLectureSlides/blob/main/JSALT_tutorial2022%20(1).pdf

2022-06-202022-06-30

The Finalist of AI Song Contest 2022 – Our Submission “Be With You”

Our submission “Be With You (与你同在)” has gotten into the finalist of the AI Song Contest 2022. The work is a collaboration with people both in the CS domain and in Music.

You can enjoy the song at

Jiatong · Be with you (与你同在)

Some technical details can be found in https://sjtmusicteam.github.io/MuskitsPage/

We are now in a public vote at https://www.aisongcontest.com/the-2022-finalists Please vote for us if you like the song~

2022-05-21

Lecture at Natural Language Processing (11-411/611)

As a TA of NLP at CMU this year, I’m offering a lecture on speech processing that introduces various speech tasks. Please feel free to check it here

2021-10-072021-10-07

Personal Statement for PhD Application

It has been a while since my PhD application. Though a personal statement (PS) only weighs a small portion of the entire application, it might be the most time-consuming one during the application. I’ve received tons of help when writing the PS. For the most, I want to thank Jonathan D. Amith, who is also a co-author in some of my papers. He takes the words very carefully and helps me through several rounds of revision. Many other people also kindly give some suggestions to the PS, including Shinji, Chao, Chunlei, and Xuankai. Meanwhile, I’ve also learned a lot from some people who shared their PS over the Internet. Therefore, I finally decided to post my PS on the website as well, in the case of helping people who would like to know my story. The current version is NOT the final version, but very close. Because at the last few days, we decide to use Word for revision instead of Latex (for easier use of my “reviewers”).

The PS can be found here

2021-05-162024-08-29

KiSing: the First Open-source Mandarin Singing Voice Synthesis Corpus

We are pleased to publish KiSing, the first open-source Mandarin singing corpus built specifically for singing voice synthesis (SVS).

Corpus Specifics

This corpus consists of singing voices and their corresponding musical and phonetic annotation. The specification is as follows.

14 songs from Keyi (Kiki) Zhang (composer, lyricist, singer)
High quality (recorded in a professional recording studio) and high sampling rate (48 kHz)
Free for non-commercial use (See “terms of use”)
Other useful data (MIDI, phoneme labels with specific duration information)

Download

Segmented singing, midi, and phonetic label

The Singer

Keyi (Kiki) Zhang, 张钶浥, is a talented Chinese female singer, composer, and lyricist. She has published around 30 songs with a variety of styles. The KiSing corpus, named after her name Kiki, mainly consists of some of his published songs. Those songs with accompaniments can be found in both QQ music and Netease Cloud Music. Feel free to check them out!

Term of Use

All the data in the corpus is licensed with Creative Commons Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0).

Main Contributors

Jiatong Shi, The Johns Hopkins University, jiatong_shi@jhu.edu

Keyi (Kiki) Zhang, the singer, composer, and lyricist

Zhaodong Yao, the writer for the music score (i.e., MIDI) annotation

Other Resources

The corresponding recipe to train a singing voice synthesis system will be released soon in Muskits

Citation

Shi, J., Guo, S., Qian, T., Hayashi, T., Wu, Y., Xu, F., Chang, X., Li, H., Wu, P., Watanabe, S., Jin, Q. (2022) Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis. Proc. Interspeech 2022, 4277-4281, doi: 10.21437/Interspeech.2022-10039

@inproceedings{shi22d_interspeech,
  author={Jiatong Shi and Shuai Guo and Tao Qian and Tomoki Hayashi and Yuning Wu and Fangzheng Xu and Xuankai Chang and Huazhe Li and Peter Wu and Shinji Watanabe and Qin Jin},
  title={{Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={4277--4281},
  doi={10.21437/Interspeech.2022-10039},
  issn={2958-1796}
}

Acknowledgment

The project is under the support of the AIM3 lab from Renmin University of China. We would also like to thank Bingrong Shi and Yunhong Wei for correcting the phonetic alignment.

2020-08-072020-10-11

CaGOP v.s. GOP (Examples for Interspeech 2020)

2019-09-232019-09-25

Homework for Spoken Language Processing (Renmin University of China)

With so much effort on discussion and design, the first published homework is out now. It is with a large workload, but it really worthy since the Kaldi structure is really awesome! Hopefully, the new students for SLP would benefit a lot through the homework.

The homework explores the basic function of Kaldi. Though Speech Recognition nowadays always follows a large dataset, we tried our best to find a tutorial with a small dataset that can be also handled on laptops.

A quick review version is here (Kaldi-Tutorial).

2019-02-15

Samples of Music Generation System

Here are samples for our self-made music generation system.

The system’s input is English sentences longer than 10 words. After that, it will analyze their sentiment and map the sentiment to one of the 16 sentiment classes. With the input, the system can generate a piece of music for the sentiment. Here are some samples for the system.

Input: “Let life be beautiful like summer flowers And Death like autumn leaves.”

Output: (the sample is generated based on Butterfly lovers but with a variation in the second part)

Input: “If by life you were deceived, Don’t be dismal, don’t be wild! In the day of grief, be mild. Merry days will come, believe. Heart is living in tomorrow; Present is dejected here; In a moment, passes sorrow; That which passes will be dear.

Output:

2018-11-232018-11-23

Journey of NetEase is Coming to an End

Six Montn ago, I joined AI group of Youdao Businssess Department in NetEase as a Machine Learning intern. It’s time to say goodbye now.

During the period, I mainly worked on a project of Computer-Assisted Language Learning (CALL). Several improvements were achieved not only for the company but also for myself.

To be specific of my progress, I tested two possible new models to improve the original model (though proved to be failed…). In addition, I designed a scoring model which helps to scale the raw output from acoustic model so that the score can be given under a find distribution. Moreover, I implemented stress detection and intonation detection algorithms in the system. In the last month, a severe alignment problem was detected. On this issue, I applied a HCLG graph to handle it and reach a better result.

It had been a tough time for me to learning so much ASR, CALL and Signal Processing Knowledge. Besides, from the intership, I got a more clear mind on midium-size systematical design other than school’s assignments. Thanks very much for Yixuan Xiao, my mentor during the intership and Thanks a lot to Youdao~

2018-10-10

Eye Tracking Experiment in Beijing Children’s Palace

We are currently working on a project of eye tracking focusing on sight-reading solution for accordion. With help of Tobii eye tracker, fixation and eye moment can be measured accurately. Based on several eye movement features, we will further extract sight-reading patterns from our data and offer suggestions for current teaching theory. Thanks a lot for abundant data support from Beijing Children’s Palace.