Greetings! This is Jiatong Shi (史嘉彤)’s homepage. I’m a second-year Ph.D. student at WAVLab, LTI, CMU. Visit me at


ftshijt's Github chart

Recent Focuses & Works

Here is a brief introduction to my recent focuses and works. I’m working on several brilliant projects these days including speech&music processing.

  • My main focus is on speech. I have been working with Prof. Shinji Wanatabe since 2019.09. We currently focus on learning useful speech representations and apply them to various speech applications. We also actively work together on the ESPnet.
  • I’m leading a group jointly working on music processing (e.g. automatic song writing, automatic music transcription, and singing voice synthesis).  We are focusing on singing voice synthesis recently. Please check our open-source Muskits (now merged into ESPnet) on Github. The work is advised by Prof. Qin Jin.

Life Records

ESPnet tutorial at JSALT2022

I held a tutorial session at JSALT2022 summer school with Leo Yang. During the workshop, Leo introduced S3PRL and I introduced ESPnet with several of our recent updates. So glad to know that many students are planning to use ESPnet after the tutorial~ The slides can be found at

The Finalist of AI Song Contest 2022 – Our Submission “Be With You”

Our submission “Be With You (与你同在)” has gotten into the finalist of the AI Song Contest 2022. The work is a collaboration with people both in the CS domain and in Music. You can enjoy the song at Jiatong · Be with you (与你同在)   Some technical details can be found in We are now in …

Publications & Awards

Publication (* for Equal Contribution)

  1. Yui Sudo, Shakeel Muhhamad, Brian Yan, Jiatong Shi, and Shinji Watanabe. 2023. “4D: Joint Modeling of CTC, Attention, Transducer, and Mask-predict Decoders”. Interspeech. [details here].
  2. Jiatong Shi, Dan Berrebbi*, William Chen*, En-Pei Hu*, Wei-Ping Huang*, Ho Lam Chung*, Xuankai Chang, Shang-Wen (Daniel) Li, Abdelrahman Mohamed, Hung-yi Lee, and Shinji Watanabe. 2023. “ML-SUPERB: Multilingual Speech Universal PERformance Benchmark”. Interspeech. [details here]
  3. Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu Gong, Juan Pino, and Shinji Watanabe. 2023. “Exploration on HuBERT with Multiple Resolution” Interspeech. [details here]
  4. Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Chen, Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny Matusov, Paul McNamee, John P. McCrae, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Ha Nguyen, Jan Niehues, Xing Niu, Atul Kr. Ojha, John E. Ortega, Proyag Pal, Juan Pino, Lonneke van der Plas, Peter Polák, Elijah Rippeth, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Yun Tang, Brian Thompson, Kevin Tran, Marco Turchi, Alex Waibel, Mingxuan Wang, Shinji Watanabe, and Rodolfo Zevallos. 2023. “Findings of the IWSLT 2023 Evaluation Campaign”. IWSLT. [details here]
  5. Brian Yan*, Jiatong Shi*, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, and Shinji Watanabe. 2023. “CMU’s IWSLT 2023 Simultaneous Speech Translation System”. IWSLT. [details here]
  6. Brian Yan*, Jiatong Shi*, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Pol\’ak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe. 2023. “ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit”. ACL demo. [details here]
  7. Tao Qian, Fan Lou, Jiatong Shi, Yuning Wu, Shuai GUo, Xiang Yin, and Qin Jin. 2023. “UniLG: A Unified Structure-aware Framework for Lyrics Generation”. ACL. [details here]
  8. William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, and Shinji Watanabe. “Improving Massively Multilingual ASR with Auxiliary CTC Objectives”. ICASSP. [details here]
  9. Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, and Hung-yi Lee. “Bridging Speech and Text Pre-trained Models with Unsupervised ASR”. ICASSP. [details here]
  10. Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, and Shinji Watanabe. 2023. “Enhancing Speech-to-Speech Translation with Multiple TTS Targets”. ICASSP. [details here]
  11. Dongji Gao*, Jiatong Shi*, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, and Sanjeev Khudanpur. 2023、”EURO: ESPnet Unsupervised ASR Open-source Toolkit”. ICASSP. [details here]
  12. Yuning Wu, Jiatong Shi, Tao Qian, and Qin Jin. 2023. “PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor”. ICASSP. [details here]
  13. Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li and Hung-yi Lee. “SUPERB@ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning”. SLT. [details here]
  14. Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-yi Lee and Hao Tang. “On Compressing Sequences for Self-Supervised Speech Models”. SLT. [details here]
  15. Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, and Shinji Watanabe. 2022. “Findings of the IWSLT 2022 Evaluation Campaign”. IWSLT. [details here]
  16. Jiatong Shi, George Saon, David Haws, Shinji Watanabe and Brian Kingsbury. ”VQ-T: RNN Transducers using Vector-Quantized Prediction Network States”. Interspeech. [details here]
  17. Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe and Qin Jin. 2022. ”Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis ”. Interspeech. [details here]
  18. Shuai Guo*, Jiatong Shi*, Tao Qian, Shinji Watanabe and Qin Jin. ”SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy ”. Interspeech. [details here]
  19. Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan Amith and Shinji Watanabe. ”Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation”. Interspeech. [details here]
  20. Keqi Deng, Shinji Watanabe, Jiatong Shi and Siddhant Arora. ”Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation ”. Interspeech. [details here]
  21. Yui Sudo, Shakeel Muhammad, Kazuhiro Nakadai, Jiatong Shi and Shinji Watanabe. ”Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection”. Interspeech. [details here]
  22. Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, and Shinji Watanabe. 2022. “CMU’s IWSLT 2022 Dialect Speech Translation System”, Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022) [details here]
  23. Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee. ”SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities”. 2022. Proceedings of the Annual Meeting of the Association for Computational Linguistics. [details here]
  24. Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, and Qin Jin. ”Training strategies for automatic song writing: a perspective with a unified framework”. 2022. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). [details here]
  25. Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu, and Dong Yu. 2022. ”Towards End-to-end Speaker Diarization with Generalized Neural Speaker Clustering”. ICASSP. [details here]
  26. Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu. 2022. ”An Investigation of Neural Uncertainty Estimation for Target Speaker Extraction Equipped RNN Transducer”. Computer Speech and Language (CSL). [details here]
  27. Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, and Shinji Watanabe, Louis-Philippe Morency. 2021. ”Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks”. APSIPA. [details here]
  28. Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, and Alan W Black. 2021.  ”Acoustic Cross-lingual Transfer using Language Similarity”. ASRU. [details here]
  29. Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Gu, Jiatong Shi, Kevin Duh, Shinji Watanabe. 2021. “ESPnet-ST IWSLT 2021 Offline Speech Translation System”. IWSLT. [details here]
  30. Shu-wen Yang, Po-Han Chi*, Yung-Sung Chuang*, Cheng-I Jeff Lai*, Kushal Lakhotia*, Yist Y Lin*, Andy T Liu*, Jiatong Shi*, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee. 2020. “SUPERB: Speech processing Universal PERformance Benchmark”. Interspeech. [details here]
  31. Jiatong Shi, Jonathan D. Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, and Shinji Watanabe. 2021. “Highland Puebla Nahuatl–Spanish Speech Translation Corpus for Endangered Language Documentation”. AmericasNLP [details here]
  32. Jonathan D. Amith, Jiatong Shi, Rey Castillo García. 2021. “End-to-End Automatic Speech Recognition: Its Impact on the Workflow for Documenting Yoloxóchitl Mixtec”. AmericasNLP. [details here]
  33. Jiatong, Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu. 2021. “Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation”. ICASSP. [details here]
  34. Jiatong, Shi*, Shuai Guo*, Nan Huo, Yuekai Zhang, Qin, Jin. 2021. “Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss”. ICASSP. [details here]
  35. Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe Kun Wei, Wangyou Zhang, Yuekai Zhang. 2021. “Recent Developments on ESPNet Toolkit Boosted by Conformer”.   ICASSP. [details here]
  36. Jiatong, Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, and Shinji Watanabe. “Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec”. EACL.  [details here]
  37. Jiatong, Shi, Kunlin, Yang, Wei, Xu, and Mingming Wang. “Leveraging deep learning with audio analytics to predict the success of crowdfunding projects.” Journal of Supercomputing (2021). [details here]
  38. Jiatong, Shi, Nan, Huo, and Qin Jin. 2020. “Context-aware Goodness of Pronunciation for Computer Assisted Pronunciation Training”. Interspeech[details here]
  39. Wenxin, Hou, Yue, Dong, Bairong Zhuang, Longfei, Yang, Jiatong, Shi, and Takahiro, Shinozaki. 2020. “Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning”.  Interspeech[details here]
  40. Jiatong, Shi, Wei, Du, and Wei, Xu. 2018. “Identifying Impact Factors of Question Quality in Online Health Q&A Communities: an Empirical Analysis on MedHelp.” PACIS.  [details here]
  41. Jiatong, Shi. 2019. “Computer Assisted Language Learning System for Young English Learner”, Undergraduate Thesis, Renmin University of China.


CMU Presidential Fellowship 2022
PhD fellowship at LTI, CMU 2021
Special Award in ‘National University Data-driven Innovation & Research Competition’ (1/594) 2018
National Level in ‘Training Programs of Innovation and Entrepreneurship for Undergraduates 2017
Scholarship of Academic Excellence (TOP 20%) 2016 & 2017 & 2018
Golden Prize in Beijing Art Festival of Undergraduates (Accordion Contest) 2016

Contact me

If you have any questions, please feel free to contact me~

My email is jiatongs at

Umm, I’m having a hard time deciding which one to use…

Flag Counter