Greetings! This is Jiatong Shi (史嘉彤)’s homepage. I’m a second-year Ph.D. student at WAVLab, LTI, CMU. Visit me at


ftshijt's Github chart

Recent Focuses & Works

Here is a brief introduction to my recent focuses and works. I’m working on several brilliant projects these days including speech&music processing, fin-tech applications, and educational supports.

  • My main focus is on speech. I have been working with Prof. Shinji Wanatabe since 2019.09. We currently focus on speech-to-speech translation with the ESPNet and Fairseq.
  • I’m leading a group jointly working on music processing (e.g. automatic song writing, automatic music transcription, and singing voice synthesis).  We are focusing on singing voice synthesis recently. Please check our open-source SVS_system on Github. The work is advised by Prof. Qin Jin.

Life Records

ESPnet tutorial at JSALT2022

I held a tutorial session at JSALT2022 summer school with Leo Yang. During the workshop, Leo introduced S3PRL and I introduced ESPnet with several of our recent updates. So glad to know that many students are planning to use ESPnet after the tutorial~ The slides can be found at

The Finalist of AI Song Contest 2022 – Our Submission “Be With You”

Our submission “Be With You (与你同在)” has gotten into the finalist of the AI Song Contest 2022. The work is a collaboration with people both in the CS domain and in Music. You can enjoy the song at Jiatong · Be with you (与你同在)   Some technical details can be found in We are now in …

Publications & Awards

Publication (* for Equal Contribution)

  1. Jiatong Shi, George Saon, David Haws, Shinji Watanabe and Brian Kingsbury. ”VQ-T: RNN Transducers using Vector-Quantized Prediction Network States” (accepted by Interspeech 2022)
  2. Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe and Qin Jin. ”Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis ” (accepted by Interspeech 2022)
  3. Shuai Guo*, Jiatong Shi*, Tao Qian, Shinji Watanabe and Qin Jin. ”SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy ” (accepted by Interspeech 2022)
  4. Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan Amith and Shinji Watanabe. ”Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation” (accepted by Interspeech 2022)
  5. Keqi Deng, Shinji Watanabe, Jiatong Shi and Siddhant Arora. ”Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation ” (accepted by Interspeech 2022)
  6. Yui Sudo, Shakeel Muhammad, Kazuhiro Nakadai, Jiatong Shi and Shinji Watanabe. ”Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection” (accepted by Interspeech 2022)
  7. Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, and Shinji Watanabe. 2022. “CMU’s IWSLT 2022 Dialect Speech Translation System”, Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
  8. Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee. ”SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities”. 2022. Proceedings of the Annual Meeting of the Association for Computational Linguistics. pp. 8479-8492 [details here]
  9. Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, and Qin Jin. ”Training strategies for automatic song writing: a perspective with a unified framework”. 2022. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4738-4742. [details here]
  10. Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu, and Dong Yu. 2022. ”Towards End-to-end Speaker Diarization with Generalized Neural Speaker Clustering”. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 8372-8376. [details here]
  11. Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu. 2022. ”An Investigation of Neural Uncertainty Estimation for Target Speaker Extraction Equipped RNN Transducer”. Computer Speech and Language (CSL)73: 10327. [details here]
  12. Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, and Shinji Watanabe, Louis-Philippe Morency. 2021. ”Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks”. Proceedings of 2021 APSIPA Annual Summit and Conference, pp. 841-848. [details here]
  13. Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, and Alan W Black. 2021.  ”Acoustic Cross-lingual Transfer using Language Similarity”. Proceedings of 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1050-1057. [details here]
  14. Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Gu, Jiatong Shi, Kevin Duh, Shinji Watanabe. 2021. “ESPnet-ST IWSLT 2021 Offline Speech Translation System” Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), pp.100-109. [details here]
  15. Shu-wen Yang, Po-Han Chi*, Yung-Sung Chuang*, Cheng-I Jeff Lai*, Kushal Lakhotia*, Yist Y Lin*, Andy T Liu*, Jiatong Shi*, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee. 2020. “SUPERB: Speech processing Universal PERformance Benchmark”. Proceedings of Interspeech 2020, International Speech Communication Association, pp. 1194-1198 [details here]
  16. Jiatong Shi, Jonathan D. Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, and Shinji Watanabe. 2021. “Highland Puebla Nahuatl–Spanish Speech Translation Corpus for Endangered Language Documentation”.Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pp.53-63. [details here]
  17. Jonathan D. Amith, Jiatong Shi, Rey Castillo García. 2021. “End-to-End Automatic Speech Recognition: Its Impact on the Workflow for Documenting Yoloxóchitl Mixtec”. Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pp. 64-80. [details here]
  18. Jiatong, Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu. 2021. “Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation”. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6908-6912. [details here]
  19. Jiatong, Shi*, Shuai Guo*, Nan Huo, Yuekai Zhang, Qin, Jin. 2021. “Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss”. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 76-80 [details here]
  20. Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe Kun Wei, Wangyou Zhang, Yuekai Zhang. 2021. “Recent Developments on ESPNet Toolkit Boosted by Conformer”.   Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5874-5878. [details here]
  21. Jiatong, Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, and Shinji Watanabe. “Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec”. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL2021), pp.1134-1145.  [details here]
  22. Jiatong, Shi, Kunlin, Yang, Wei, Xu, and Mingming Wang. “Leveraging deep learning with audio analytics to predict the success of crowdfunding projects.” Journal of Supercomputing (2021). [details here]
  23. Jiatong, Shi, Nan, Huo, and Qin Jin. 2020. “Context-aware Goodness of Pronunciation for Computer Assisted Pronunciation Training”. Proceedings of  Interspeech 2020, International Speech Communication Association, pp. 3057-3061. [details here]
  24. Wenxin, Hou, Yue, Dong, Bairong Zhuang, Longfei, Yang, Jiatong, Shi, and Takahiro, Shinozaki. 2020. “Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning”.  Proceedings of  Interspeech 2020, International Speech Communication Association, pp. 1047-1051. [details here]
  25. Jiatong, Shi, Wei, Du, and Wei, Xu. 2018. “Identifying Impact Factors of Question Quality in Online Health Q&A Communities: an Empirical Analysis on MedHelp.” Proceedings of the 22nd Pacific Asia Conference on Information Systems (PACIS 2018), Association of Information Systems, pp. 1886- 1898  [details here]
  26. Jiatong, Shi. 2019. “Computer Assisted Language Learning System for Young English Learner”, Undergraduate Thesis, Renmin University of China.



Special Award in ‘National University Data-driven Innovation & Research Competition’ (1/594) 2018
National Level in ‘Training Programs of Innovation and Entrepreneurship for Undergraduates 2017
Scholarship of Academic Excellence (TOP 20%) 2016 & 2017 & 2018
Golden Prize in Beijing Art Festival of Undergraduates (Accordion Contest) 2016

Contact me

If you have any questions, please feel free to contact me~

My email is jiatong_shi at


Umm, I’m having a hard time deciding which one to use…

Flag Counter