Greetings! This is Jiatong Shi (史嘉彤)’s homepage. I’m a second-year Ph.D. student at WAVLab, LTI, CMU. Visit me at
Recent Focuses & Works
Here is a brief introduction to my recent focuses and works. I’m working on several brilliant projects these days including speech&music processing.
- My main focus is on speech. I have been working with Prof. Shinji Wanatabe since 2019.09. We currently focus on speech-to-speech translation with the ESPnet and Fairseq.
- I’m leading a group jointly working on music processing (e.g. automatic song writing, automatic music transcription, and singing voice synthesis). We are focusing on singing voice synthesis recently. Please check our open-source Muskits (now merged into ESPnet) on Github. The work is advised by Prof. Qin Jin.
Life Records
ESPnet tutorial at JSALT2022
I held a tutorial session at JSALT2022 summer school with Leo Yang. During the workshop, Leo introduced S3PRL and I introduced ESPnet with several of our recent updates. So glad to know that many students are planning to use ESPnet after the tutorial~ The slides can be found at https://github.com/ftshijt/PublicLectureSlides/blob/main/JSALT_tutorial2022%20(1).pdf
The Finalist of AI Song Contest 2022 – Our Submission “Be With You”
Our submission “Be With You (与你同在)” has gotten into the finalist of the AI Song Contest 2022. The work is a collaboration with people both in the CS domain and in Music. You can enjoy the song at Jiatong · Be with you (与你同在) Some technical details can be found in https://sjtmusicteam.github.io/MuskitsPage/ We are now in …
Continue reading “The Finalist of AI Song Contest 2022 – Our Submission “Be With You””
Lecture at Natural Language Processing (11-411/611)
As a TA of NLP at CMU this year, I’m offering a lecture on speech processing that introduces various speech tasks. Please feel free to check it here
Publications & Awards
Publication (* for Equal Contribution)
- William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, and Shinji Watanabe. “Improving Massively Multilingual ASR with Auxiliary CTC Objectives” (accepted by ICASSP2023).
- Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, and Hung-yi Lee. “Bridging Speech and Text Pre-trained Models with Unsupervised ASR” (accepted by ICASSP2023).
- Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, and Shinji Watanabe. 2023. “Enhancing Speech-to-Speech Translation with Multiple TTS Targets” (accepted by ICASSP2023).
- Dongji Gao*, Jiatong Shi*, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, and Sanjeev Khudanpur. 2023、”EURO: ESPnet Unsupervised ASR Open-source Toolkit” (accepted by ICASSP2023).
- Yuning Wu, Jiatong Shi, Tao Qian, and Qin Jin. 2023. “PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor” (accepted by ICASSP2023).
- Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li and Hung-yi Lee. “SUPERB@ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning”. SLT. [details here]
- Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-yi Lee and Hao Tang. “On Compressing Sequences for Self-Supervised Speech Models”. SLT. [details here]
- Jiatong Shi, George Saon, David Haws, Shinji Watanabe and Brian Kingsbury. ”VQ-T: RNN Transducers using Vector-Quantized Prediction Network States”. Interspeech. [details here]
- Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe and Qin Jin. 2022. ”Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis ”. Interspeech. [details here]
- Shuai Guo*, Jiatong Shi*, Tao Qian, Shinji Watanabe and Qin Jin. ”SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy ”. Interspeech. [details here]
- Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan Amith and Shinji Watanabe. ”Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation”. Interspeech. [details here]
- Keqi Deng, Shinji Watanabe, Jiatong Shi and Siddhant Arora. ”Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation ”. Interspeech. [details here]
- Yui Sudo, Shakeel Muhammad, Kazuhiro Nakadai, Jiatong Shi and Shinji Watanabe. ”Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection”. Interspeech. [details here]
- Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, and Shinji Watanabe. 2022. “CMU’s IWSLT 2022 Dialect Speech Translation System”, Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022) [details here]
- Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee. ”SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities”. 2022. Proceedings of the Annual Meeting of the Association for Computational Linguistics. [details here]
- Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, and Qin Jin. ”Training strategies for automatic song writing: a perspective with a unified framework”. 2022. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). [details here]
- Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu, and Dong Yu. 2022. ”Towards End-to-end Speaker Diarization with Generalized Neural Speaker Clustering”. ICASSP. [details here]
- Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu. 2022. ”An Investigation of Neural Uncertainty Estimation for Target Speaker Extraction Equipped RNN Transducer”. Computer Speech and Language (CSL). [details here]
- Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, and Shinji Watanabe, Louis-Philippe Morency. 2021. ”Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks”. APSIPA. [details here]
- Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, and Alan W Black. 2021. ”Acoustic Cross-lingual Transfer using Language Similarity”. ASRU. [details here]
- Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Gu, Jiatong Shi, Kevin Duh, Shinji Watanabe. 2021. “ESPnet-ST IWSLT 2021 Offline Speech Translation System”. IWSLT. [details here]
- Shu-wen Yang, Po-Han Chi*, Yung-Sung Chuang*, Cheng-I Jeff Lai*, Kushal Lakhotia*, Yist Y Lin*, Andy T Liu*, Jiatong Shi*, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee. 2020. “SUPERB: Speech processing Universal PERformance Benchmark”. Interspeech. [details here]
- Jiatong Shi, Jonathan D. Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, and Shinji Watanabe. 2021. “Highland Puebla Nahuatl–Spanish Speech Translation Corpus for Endangered Language Documentation”. AmericasNLP [details here]
- Jonathan D. Amith, Jiatong Shi, Rey Castillo García. 2021. “End-to-End Automatic Speech Recognition: Its Impact on the Workflow for Documenting Yoloxóchitl Mixtec”. AmericasNLP. [details here]
- Jiatong, Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu. 2021. “Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation”. ICASSP. [details here]
- Jiatong, Shi*, Shuai Guo*, Nan Huo, Yuekai Zhang, Qin, Jin. 2021. “Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss”. ICASSP. [details here]
- Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe Kun Wei, Wangyou Zhang, Yuekai Zhang. 2021. “Recent Developments on ESPNet Toolkit Boosted by Conformer”. ICASSP. [details here]
- Jiatong, Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, and Shinji Watanabe. “Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec”. EACL. [details here]
- Jiatong, Shi, Kunlin, Yang, Wei, Xu, and Mingming Wang. “Leveraging deep learning with audio analytics to predict the success of crowdfunding projects.” Journal of Supercomputing (2021). https://doi.org/10.1007/s11227-020-03595-2 [details here]
- Jiatong, Shi, Nan, Huo, and Qin Jin. 2020. “Context-aware Goodness of Pronunciation for Computer Assisted Pronunciation Training”. Interspeech. [details here]
- Wenxin, Hou, Yue, Dong, Bairong Zhuang, Longfei, Yang, Jiatong, Shi, and Takahiro, Shinozaki. 2020. “Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning”. Interspeech. [details here]
- Jiatong, Shi, Wei, Du, and Wei, Xu. 2018. “Identifying Impact Factors of Question Quality in Online Health Q&A Communities: an Empirical Analysis on MedHelp.” PACIS. [details here]
- Jiatong, Shi. 2019. “Computer Assisted Language Learning System for Young English Learner”, Undergraduate Thesis, Renmin University of China.
Awards
CMU Presidential Fellowship | 2022 |
PhD fellowship at LTI, CMU | 2021 |
Special Award in ‘National University Data-driven Innovation & Research Competition’ (1/594) | 2018 |
National Level in ‘Training Programs of Innovation and Entrepreneurship for Undergraduates | 2017 |
Scholarship of Academic Excellence (TOP 20%) | 2016 & 2017 & 2018 |
Golden Prize in Beijing Art Festival of Undergraduates (Accordion Contest) | 2016 |