A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Yuchao Zhang

Last update: Jul 14, 2022

Related tags

Overview

WaveGlow

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Quick Start:

Install requirements:

pip install -r requirements.txt

Download dataset:

wget http://festvox.org/cmu_arctic/cmu_arctic/packed/cmu_us_slt_arctic-0.95-release.tar.bz2
tar xf cmu_us_slt_arctic-0.95-release.tar.bz2

Extract features: feature extracting pipeline is the same as tacotron
Training with default hyperparams:

python train.py

Synthesize from model:

python generate.py --checkpoint=/path/to/model --local_condition_file=/path/to/local_conditon

Notes:

This is not official implementation, some details are not necessarily correct.
Work in progress.

You might also like...

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart

247 Jan 5, 2023

End-2-end speech synthesis with recurrent neural networks

Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p

214 Dec 7, 2022

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 9, 2023

Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

11 Nov 17, 2022

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

3.2k Dec 31, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

1 Dec 20, 2021

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognition, speech enhancement, multi-microphone signal processing and many others.

5.1k Jan 9, 2023

Comments

How to choose upsampling rte for conditioning?

@npuichigo

I have local conditioning of following shape 1 X 1000 X 50 (for 1 sec) with input speech of 16k sampling rate. Can you recommend, which up sampling factor I should choose ? and How to estimate up sampling factor ? Thanks

opened by ajinkyakulkarni14 5

Undefined name: Where is _GetFileAndLine() defined?

https://github.com/npuichigo/waveglow/search?q=_GetFileAndLine

flake8 testing of https://github.com/npuichigo/waveglow on Python 3.7.1

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./waveglow/logging.py:144:38: F821 undefined name '_GetFileAndLine'
    count = _GetNextLogCountPerToken(_GetFileAndLine())
                                     ^
./waveglow/logging.py:157:38: F821 undefined name '_GetFileAndLine'
    count = _GetNextLogCountPerToken(_GetFileAndLine())
                                     ^
2     F821 undefined name '_GetFileAndLine'
2

opened by cclauss 1

Early Output implementation is same as the paper?

Hi,

It's a nice work! I found that you output half of current channels for early output. But in the paper, they output constant channels for early output. In their case, output 2 channels. Am I wrong? Or there are some tricks?

opened by dhgrs 1

negative loss value

i'm using your code to train my dataset,the codes ran ok while loss value is negative ,is this normal?

INFO:pytorch:Let's use 1 GPUs!
C:\Python35\lib\site-packages\torch\nn\modules\upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
INFO:pytorch:[1,   1] loss: 8774.103
INFO:pytorch:[1,   2] loss: 247.321
INFO:pytorch:[1,   3] loss: 2267.624
INFO:pytorch:[1,   4] loss: 2223.914
INFO:pytorch:[1,   5] loss: 21230.785
INFO:pytorch:[1,   6] loss: -4516.562
INFO:pytorch:[1,   7] loss: -1636.356
INFO:pytorch:[1,   8] loss: -1802.424
INFO:pytorch:[1,   9] loss: -973.106
INFO:pytorch:[1,  10] loss: -2390.099
INFO:pytorch:[1,  11] loss: -3500.059
INFO:pytorch:[1,  12] loss: -2850.755
INFO:pytorch:[1,  13] loss: -4785.271
INFO:pytorch:[1,  14] loss: -5666.863
INFO:pytorch:[1,  15] loss: -6398.563
INFO:pytorch:[1,  16] loss: 6809.037
INFO:pytorch:[1,  17] loss: 6204.096
INFO:pytorch:[1,  18] loss: -9287.712
INFO:pytorch:[1,  19] loss: -3614.344

opened by shartoo 3

Owner

Yuchao Zhang

speech synthesis/machine learning

GitHub

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

10 Oct 13, 2022

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

59 Dec 1, 2022

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

67 Nov 14, 2022

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

1.8k Dec 30, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

End to End Automatic Speech Recognition In this repository, I have developed an end to end Automatic speech recognition project. I have developed the

22 Nov 13, 2022

PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

279 Dec 9, 2022

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

135 Dec 18, 2022

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented. Mostly I would recommend giving a quick look to the figures beyond the introduction.

38.5k Jan 3, 2023

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Related tags

Overview

WaveGlow

Quick Start:

Notes:

You might also like...

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

End-2-end speech synthesis with recurrent neural networks

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Speech Recognition for Uyghur using Speech transformer

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Simple Speech to Text, Text to Speech

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

Comments

How to choose upsampling rte for conditioning?

Undefined name: Where is _GetFileAndLine() defined?

Early Output implementation is same as the paper?

negative loss value

Owner

Yuchao Zhang

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

PyTorch implementation of Tacotron speech synthesis model.

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Binaural Speech Synthesis

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)