GradTTS
arxiv)
Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (About this repo
This is an unofficial implementation of GradTTS. We created this project based on GlowTTS (https://github.com/jaywalnut310/glow-tts). We replace the GlowDecoder with DiffusionDecoder which follows the settings of the original paper. In addition, we also replace torch.distributed with horovod for convenience and we don't use fp16 now.
Training and inference
Please go to egs/ folder, and see run.sh and inference_waveglow_vocoder.py for example use. Before training, please download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: ln -s /path/to/LJSpeech-1.1/wavs DUMMY
. And build Monotonic Alignment Search Code (Cython): cd monotonic_align; python setup.py build_ext --inplace
. Before inference, you should download waveglow checkpoint from download_link and put it into the waveglow folder.
Reference Materials
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Score-Based Generative Modeling through Stochastic Differential Equations
Authors
Heyang Xue(https://github.com/WelkinYang) and Qicong Xie(https://github.com/QicongXie)