Haohan Guo

PhD Student @ CUHK

The Chinese University of Hong Kong

Biography

Hello! I am Haohan Guo (郭浩瀚), a PhD student @ CUHK, supervised by Prof. Helen Mei Ling MENG. Before it, I received my M.S. and B.S. degrees from Northwestern Polytechnical University, supervised by Prof. Lei Xie. Then, I worked as a researcher at Sogou Inc during 2020-2021. My current research topic is deep learning based speech synthesis. If you are interested in my works, welcome to contact me.

Interests

Speech & Audio Processing
Speech Synthesis
Voice Conversion
Audio Generation

Education

PhD in Computer Science, 2021-
The Chinese University of Hong Kong
MSc in Computer Science, 2017-2020
Northwestern Polytechnical University
BSc in Computer Science, 2013-2017
Northwestern Polytechnical University

Publications

Quickly discover relevant content by filtering publications.

MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE based Neural TTS

Journal paper accepted by TASLP

Haohan Guo, Fenglong Xie, Frank K. Soong, Xixin Wu, Helen Meng

MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE based Neural TTS

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

Submitted to Arxiv

Haohan Guo, Fenglong Xie, Hui Lu, Xixin Wu, Helen Meng

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

Conference paper accepted by INTERSPEECH 2022

Haohan Guo, Fenglong Xie, Frank K. Soong, Xixin Wu, Helen Meng

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS

Conference paper accepted by INTERSPEECH 2022

Haohan Guo, Hui Lu, Xixin Wu, Helen Meng

Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals

Conference paper accepted by ICASSP 2022

Haohan Guo, Zhiping Zhou, Fanbo Meng, Kai Liu

Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals

Conversational End-to-End TTS for Voice Agents

Conference paper accepted by SLT 2020

Haohan Guo, Shaofei Zhang, Frank K. Soong, Lei He, Lei Xie

Conversational End-to-End TTS for Voice Agents

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

Conference paper submitted to Arxiv

Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

A New GAN-based End-to-End TTS Training Algorithm

Conference paper accepted by INTERSPEECH 2019

Haohan Guo, Frank K. Soong, Lei He, Lei Xie

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS

Conference paper accepted by INTERSPEECH 2019

Haohan Guo, Frank K. Soong, Lei He, Lei Xie

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS

Work Experience

lab, internship, full-time employee

Applied Scientist Intern

Amazon

Jun 2023 – Nov 2023 Cambridge, UK

Work as an applied scientist intern to develop large-scale TTS system based on large language models (LLM).

Research Intern

Xiaohongshu

Aug 2020 – May 2022 Beijing, China

Work as a researcher intern to investigate the application of speech representations in TTS.

Researcher

Sogou

Dec 2020 – Jul 2021 Beijing, China

Work as a researcher on singing voice conversion. We aim to develop a commercial singing conversion system which can convert arbitrary singing voice to the target timbre. High sound quality and accurate melody expression are both required.

Research Intern

Tencent AI Lab

May 2020 – Dec 2020 Beijing, China

Research topic is multi-singer singing voice conversion. We propose a MelGAN based end-to-end PPG-SVC model. It significantly improves the sound quality and singer similarity over the conventional PPG-SVC framework. The work is summarized to the paper, Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.

Research Intern

Microsoft Research Asia & Microsoft STCA

May 2018 – Sep 2019 Beijing, China

Supervised by Frank K. Soong and Lei He. We aim to improve the robustness and naturalness of end-to-end TTS. Two main works are published to INTERSPEECH 2019, A New GAN-based End-to-End TTS Training Algorithm and Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS. We also investigate the conversational TTS using the end-to-end approach. The work is published to SLT 2021, Conversational End-to-End TTS for Voice Agents.