Jiatong Shi's Homepage

Publication (* for Equal Contribution)

Jiatong Shi, Hye-jin Shim, Jinchuan Tian, Siddhant Arora, Haibin Wu, Darius Petermann, Jia Qi Yip, You Zhang, Yuxun Tang, Wangyou Zhang, Dareen Safar Alharthi, Yichen Huang, Koichi Saito, Jionghao Han, Yiwen Zhao, Chris Donahue, and Shinji Watanabe. 2025. “VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music”. NAACL Demo. [details here]
Jinchuan Tian, Chunlei Zhang, Jiatong Shi, Hao Zhang, Jianwei Yu, Shinji Watanabe, and Dong Yu. 2025. “Preference Alignment Improves Language Model-Based TTS”. ICASSP. [details here]
Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander Liu, Bhiksha Raj, Qin Jin, Ruihua Song, and Shinji Watanabe. 2024. “ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech”. SLT. [details here]
Yifeng Yu, Jiatong Shi, Yuning Wu, Shinji Watanabe. 2024. “VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation” SLT. [details here]
Shih-Heng Wang, Jiatong Shi, Chien-yu Huang, Shinji Watanabe, and Hung-yi Lee. 2024. “Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition”. SLT. [details here]
Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, and Shinji Watanabe. 2024. “ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration”. [details here]
You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. 2024. “SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge”. SLT. [details here]
William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jiatong Shi, Jinchuan Tian, Xuankai Chang, Soumi Maiti, Karen Livescu, and Shinji Watanabe. “Towards Robust Speech Representation Learning for Thousands of Languages”. EMNLP. [details here]
Kristin Qi, Jiatong Shi, Caroline Summerour, John Batsis, and Xiaohui Liang. 2024. “Exploiting Longitudinal Speech Data via Voice Assistant Systems for Early Detection of Cognitive Decline”. IEEE Healthcom2024. [details here]
Ibrahim Said Ahmad, Antonios Anastasopoulos, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico, Barry Haddow, Dávid Javorský, Mateusz Krubiński, Tsz Kim Lam, Xutai Ma, Prashant Mathur, Evgeny Matusov, Chandresh Maurya, John McCrae, Kenton Murray, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, Atul Kr. Ojha, John Ortega, Sara Papi, Peter Polák, Adam Pospíšil, Pavel Pecina, Elizabeth Salesky, Nivedita Sethiya, Balaram Sarkar, Jiatong Shi, Claytone Sikasote, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Brian Thompson, Alex Waibel, Shinji Watanabe, Patrick Wilken, Petr Zemánek, and Rodolfo Zevallos. 2024. “Findings of the IWSLT 2024 Evaluation Campaign”. IWSLT. [details here]
Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, and Shinji Watanabe. 2024. “ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets”. Interspeech. [details here]
Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Sun, and Shinji Watanabe. 2024. “MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model”. Interspeech. [details here]
Jiatong Shi*, Yueqian Lin*, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, and Shinji Watanabe. 2024. “Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing”. (Accepted by Interspeech 2024) [details here]
Kalvin Chang, Yi-Hui Chou, Jiatong Shi, Hsuan-Ming Chen, Nicole Holliday, Odette Scharenborg, and David R. Mortensen. 2024. “Self-supervised Speech Representations Still Struggle with African American Vernacular English”. Interspeech. [details here]
Tejes Srivastava, Jiatong Shi, William Chen, and Shinji Watanabe. 2024. “EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios”. Interspeech. [details here]
Jee-weon Jung*, Wangyou Zhang*, Jiatong Shi*, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, and Shinji Watanabe. 2024. “ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models”. Interspeech. [details here]
Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, and Shinji Watanabe. 2024. “OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer”. Interspeech. [details here]
Shuhua Li, Qirong Mao, and Jiatong Shi. 2024. “PL-TTS: A Generalizable Prompt-based Diffusion TTS Augmented by Large Language Model”. Interspeech. [details here]
Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, and Qin Jin. 2024. “The Interspeech 2024 Challenge on Speech Processing Using Discrete Units”. Interspeech. [details here]
Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, Jing Guo, Tomoki Toda, Zhiyao Duan. 2024. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection”. Interspeech [details here]
Yuxun Tang, Yuning Wu, Jiatong Shi, and Qin Jin. 2024. “SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models”. Interspeech. [details here]
Yuning Wu, Chunlei Zhang, Jiatong Shi, Yuxun Tang, and Qin Jin 2024. “TokSing: Singing Voice Synthesis based on Discrete Tokens”. Interspeech [details here]
Yuxun Tang, Jiatong Shi, Yuning Wu, and Qin Jin. 2024. “An Exploration on Singing MOS Prediction” ISCSLP. [details here]
Yuning Wu, Jiatong Shi, Yifeng Yu, Yuxun Tang, Qian Tao, Yueqian Lin, Jionghao Han, Xinyi Bai, Shinji Watanabe, and Qin Jin. 2024. “Muskits-ESPnet: a Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm”. ACMMM. [details here]
Yuning Wu*, Yifeng Yu*, Jiatong Shi, Tao Qian, Qin Jin. 2024. “A Systematic Exploration of Joint-training for Singing Voice Synthesis” ISCSLP. [details here]
Taiqi He, Kwanghee Choi, Lindia Tjuatja, Jiatong Shi, Nate Robinson, Graham Neubig, Shinji Watanabe, David Mortensen, and Lori Levin. 2024. “WAV2GLOSS: Generating Interlinear Glossed Text from Speech”. ACL. [Details here]
Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Jinchuan Tian, Luping Liu, Zhenhui Ye, Ziyue Jiang, Xuankai Chang, Jiatong Shi, Chao Weng, Zhou Zhao, and Dong Yu. 2024. “Revisiting Voice Large Language Models as Scalable Multi-Lingual and Multi-Task Learners”. ACL.
Dongchao Yang*, Jinchuan Tian*, Xu Tan, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, and Helen Meng. 2024. “UniAudio: An Audio Foundation Model Toward Universal Audio Generation”. ICML. [details here]
Shu-wen Yang, Heng-Jui Chang*, Zili Huang*, Andy T. Liu*, Cheng-I Lai*, Haibin Wu*, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Abdelrahman Mohamed, Shang-Wen Li, Shinji Watanabe, and Hung-yi Lee. 2024. “A Large-Scale Evaluation of Speech Foundation Models”. TASLP. [Details here]
Jiatong Shi, Hirofumi Inaguma, Xutai Ma, Ilia Kulikov, and Anna Sun. 2024. “Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction”. ICLR. [Details here]
Rongjie Huang*, Mingze Li*, Dongchao Yang*, Jiatong Shi*, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe. 2024. “AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head”. AAAI. [Details here]
Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, and Shinji Watanabe. 2024. “HuBERTopic: Enhancing semantic representation of HuBERT through self-supervision utilizing topic model”. ICASSP. [Details here]
Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, and Hung-yi Lee. 2024. “Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech”. ICASSP. [Details here]
Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, and Hsiu-Hsuan Wang. 2024. “Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study”. ICASSP. [details here]
Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Lu-Tshiann Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, and Jiatong Shi. 2023. “Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus”. ASRU. [details here]
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-Weon Jung, Soumi Maiti, and Shinji Watanabe. 2023. “Reproducing Whisper-Style Pre-training Using an Open-Source Toolkit and Publicly Available Data”. ASRU. [details here]
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, and Shinji Watanabe. 2023. “Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning”. ASRU. [details here]
Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, and Shinji Watanabe. 2023. “Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond”. ASRU. [details here]
Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, and Tomoki Toda. 2023. “The Singing Voice Conversion Challenge 2023”. ASRU [details here]
Yui Sudo, Shakeel Muhhamad, Brian Yan, Jiatong Shi, and Shinji Watanabe. 2023. “4D: Joint Modeling of CTC, Attention, Transducer, and Mask-predict Decoders”. Interspeech. [details here].
Jiatong Shi, Dan Berrebbi*, William Chen*, En-Pei Hu*, Wei-Ping Huang*, Ho Lam Chung*, Xuankai Chang, Shang-Wen (Daniel) Li, Abdelrahman Mohamed, Hung-yi Lee, and Shinji Watanabe. 2023. “ML-SUPERB: Multilingual Speech Universal PERformance Benchmark”. Interspeech. [details here]
Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu Gong, Juan Pino, and Shinji Watanabe. 2023. “Exploration on HuBERT with Multiple Resolution” Interspeech. [details here]
Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Chen, Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny Matusov, Paul McNamee, John P. McCrae, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Ha Nguyen, Jan Niehues, Xing Niu, Atul Kr. Ojha, John E. Ortega, Proyag Pal, Juan Pino, Lonneke van der Plas, Peter Polák, Elijah Rippeth, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Yun Tang, Brian Thompson, Kevin Tran, Marco Turchi, Alex Waibel, Mingxuan Wang, Shinji Watanabe, and Rodolfo Zevallos. 2023. “Findings of the IWSLT 2023 Evaluation Campaign”. IWSLT. [details here]
Brian Yan*, Jiatong Shi*, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, and Shinji Watanabe. 2023. “CMU’s IWSLT 2023 Simultaneous Speech Translation System”. IWSLT. [details here]
Brian Yan*, Jiatong Shi*, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Pol\’ak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe. 2023. “ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit”. ACL demo. [details here]
Tao Qian, Fan Lou, Jiatong Shi, Yuning Wu, Shuai GUo, Xiang Yin, and Qin Jin. 2023. “UniLG: A Unified Structure-aware Framework for Lyrics Generation”. ACL. [details here]
William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, and Shinji Watanabe. “Improving Massively Multilingual ASR with Auxiliary CTC Objectives”. ICASSP. [details here]
Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, and Hung-yi Lee. “Bridging Speech and Text Pre-trained Models with Unsupervised ASR”. ICASSP. [details here]
Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, and Shinji Watanabe. 2023. “Enhancing Speech-to-Speech Translation with Multiple TTS Targets”. ICASSP. [details here]
Dongji Gao*, Jiatong Shi*, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, and Sanjeev Khudanpur. 2023、”EURO: ESPnet Unsupervised ASR Open-source Toolkit”. ICASSP. [details here]
Yuning Wu, Jiatong Shi, Tao Qian, and Qin Jin. 2023. “PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor”. ICASSP. [details here]
Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li and Hung-yi Lee. “SUPERB@ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning”. SLT. [details here]
Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-yi Lee and Hao Tang. “On Compressing Sequences for Self-Supervised Speech Models”. SLT. [details here]
Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, and Shinji Watanabe. 2022. “Findings of the IWSLT 2022 Evaluation Campaign”. IWSLT. [details here]
Jiatong Shi, George Saon, David Haws, Shinji Watanabe and Brian Kingsbury. ”VQ-T: RNN Transducers using Vector-Quantized Prediction Network States”. Interspeech. [details here]
Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe and Qin Jin. 2022. ”Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis ”. Interspeech. [details here]
Shuai Guo*, Jiatong Shi*, Tao Qian, Shinji Watanabe and Qin Jin. ”SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy ”. Interspeech. [details here]
Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan Amith and Shinji Watanabe. ”Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation”. Interspeech. [details here]
Keqi Deng, Shinji Watanabe, Jiatong Shi and Siddhant Arora. ”Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation ”. Interspeech. [details here]
Yui Sudo, Shakeel Muhammad, Kazuhiro Nakadai, Jiatong Shi and Shinji Watanabe. ”Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection”. Interspeech. [details here]
Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, and Shinji Watanabe. 2022. “CMU’s IWSLT 2022 Dialect Speech Translation System”, Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022) [details here]
Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee. ”SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities”. 2022. Proceedings of the Annual Meeting of the Association for Computational Linguistics. [details here]
Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, and Qin Jin. ”Training strategies for automatic song writing: a perspective with a unified framework”. 2022. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). [details here]
Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu, and Dong Yu. 2022. ”Towards End-to-end Speaker Diarization with Generalized Neural Speaker Clustering”. ICASSP. [details here]
Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu. 2022. ”An Investigation of Neural Uncertainty Estimation for Target Speaker Extraction Equipped RNN Transducer”. Computer Speech and Language (CSL). [details here]
Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, and Shinji Watanabe, Louis-Philippe Morency. 2021. ”Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks”. APSIPA. [details here]
Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, and Alan W Black. 2021. ”Acoustic Cross-lingual Transfer using Language Similarity”. ASRU. [details here]
Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Gu, Jiatong Shi, Kevin Duh, Shinji Watanabe. 2021. “ESPnet-ST IWSLT 2021 Offline Speech Translation System”. IWSLT. [details here]
Shu-wen Yang, Po-Han Chi*, Yung-Sung Chuang*, Cheng-I Jeff Lai*, Kushal Lakhotia*, Yist Y Lin*, Andy T Liu*, Jiatong Shi*, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee. 2020. “SUPERB: Speech processing Universal PERformance Benchmark”. Interspeech. [details here]
Jiatong Shi, Jonathan D. Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, and Shinji Watanabe. 2021. “Highland Puebla Nahuatl–Spanish Speech Translation Corpus for Endangered Language Documentation”. AmericasNLP [details here]
Jonathan D. Amith, Jiatong Shi, Rey Castillo García. 2021. “End-to-End Automatic Speech Recognition: Its Impact on the Workflow for Documenting Yoloxóchitl Mixtec”. AmericasNLP. [details here]
Jiatong, Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu. 2021. “Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation”. ICASSP. [details here]
Jiatong, Shi*, Shuai Guo*, Nan Huo, Yuekai Zhang, Qin, Jin. 2021. “Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss”. ICASSP. [details here]
Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe Kun Wei, Wangyou Zhang, Yuekai Zhang. 2021. “Recent Developments on ESPNet Toolkit Boosted by Conformer”. ICASSP. [details here]
Jiatong, Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, and Shinji Watanabe. “Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec”. EACL. [details here]
Jiatong, Shi, Kunlin, Yang, Wei, Xu, and Mingming Wang. “Leveraging deep learning with audio analytics to predict the success of crowdfunding projects.” Journal of Supercomputing (2021). https://doi.org/10.1007/s11227-020-03595-2 [details here]
Jiatong, Shi, Nan, Huo, and Qin Jin. 2020. “Context-aware Goodness of Pronunciation for Computer Assisted Pronunciation Training”. Interspeech. [details here]
Wenxin, Hou, Yue, Dong, Bairong Zhuang, Longfei, Yang, Jiatong, Shi, and Takahiro, Shinozaki. 2020. “Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning”. Interspeech. [details here]
Jiatong, Shi, Wei, Du, and Wei, Xu. 2018. “Identifying Impact Factors of Question Quality in Online Health Q&A Communities: an Empirical Analysis on MedHelp.” PACIS. [details here]
Jiatong, Shi. 2019. “Computer Assisted Language Learning System for Young English Learner”, Undergraduate Thesis, Renmin University of China.

Awards

CMU Presidential Fellowship	2022
PhD fellowship at LTI, CMU	2021
Special Award in ‘National University Data-driven Innovation & Research Competition’ (1/594)	2018
National Level in ‘Training Programs of Innovation and Entrepreneurship for Undergraduates	2017
Scholarship of Academic Excellence (TOP 20%)	2016 & 2017 & 2018
Golden Prize in Beijing Art Festival of Undergraduates (Accordion Contest)	2016

Jiatong Shi's Homepage

Preview

Short Bio and Recent Focuses

Life Records

ESPnet tutorial at JSALT2022

The Finalist of AI Song Contest 2022 – Our Submission “Be With You”

Lecture at Natural Language Processing (11-411/611)

Publications & Awards

Contact me