Short Bio and Recent Focuses

Jiatong Shi is a Ph.D. candidate in the Language Technologies Institute at Carnegie Mellon University, advised by Dr. Shinji Watanabe. His research focuses on speech representation learning and its applications across various speech processing tasks. He has authored over 70 publications in leading speech and machine learning conferences and has received multiple prestigious honors, including the Best Paper Award at ISCA Interspeech 2024, the Best Paper Award at EMNLP 2024, and the CMU Presidential Fellowship. Jiatong is also a strong advocate for open-source research, making significant contributions to major toolkits such as ESPnet, Muskits, and VERSA. He has played a key role in curating and releasing influential open datasets, including ML-SUPERB, SingMOS, KiSing, and several endangered language corpora, which have driven advancements in speech and music processing.

Here is a brief introduction to my recent focuses and works. I’m working on several brilliant projects these days including speech&music processing.

  • My main focus is on speech. I have been working with Prof. Shinji Wanatabe since 2019.09. We currently focus on learning useful speech representations and apply them to various speech applications. We also actively work together on the ESPnet.
  • I’m leading a group jointly working on music processing (e.g. automatic song writing, automatic music transcription, and singing voice synthesis).  We are focusing on singing voice synthesis recently. Please check our open-source Muskits (now merged into ESPnet) on Github. The work is advised by Prof. Qin Jin.