A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Yuchao Zhang

Last update: Jul 14, 2022

Related tags

Overview

WaveGlow

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Quick Start:

Install requirements:

pip install -r requirements.txt

Download dataset:

wget http://festvox.org/cmu_arctic/cmu_arctic/packed/cmu_us_slt_arctic-0.95-release.tar.bz2
tar xf cmu_us_slt_arctic-0.95-release.tar.bz2

Extract features: feature extracting pipeline is the same as tacotron
Training with default hyperparams:

python train.py

Synthesize from model:

python generate.py --checkpoint=/path/to/model --local_condition_file=/path/to/local_conditon

Notes:

This is not official implementation, some details are not necessarily correct.
Work in progress.

You might also like...

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

StackGAN-v2 StackGAN-v1: Tensorflow implementation StackGAN-v1: Pytorch implementation Inception score evaluation Pytorch implementation for reproduci

809 Dec 16, 2022

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

65 Dec 20, 2022

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

WaveGrad2 - PyTorch Implementation PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Status (202

59 Dec 6, 2022

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Daft-Exprt - PyTorch Implementation PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis The

47 Dec 18, 2022

PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

279 Dec 9, 2022

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

Lip to Speech Synthesis with Visual Context Attentional GAN This repository contains the PyTorch implementation of the following paper: Lip to Speech

6 Nov 2, 2022

[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

NeRFlow [ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing Datasets The pouring dataset used for experiments can be download he

44 Dec 20, 2022

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

117 Nov 27, 2022

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

PortaSpeech - PyTorch Implementation PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech. Model Size Module Nor

279 Jan 4, 2023

Comments

How to choose upsampling rte for conditioning?

@npuichigo

I have local conditioning of following shape 1 X 1000 X 50 (for 1 sec) with input speech of 16k sampling rate. Can you recommend, which up sampling factor I should choose ? and How to estimate up sampling factor ? Thanks

opened by ajinkyakulkarni14 5

Undefined name: Where is _GetFileAndLine() defined?

https://github.com/npuichigo/waveglow/search?q=_GetFileAndLine

flake8 testing of https://github.com/npuichigo/waveglow on Python 3.7.1

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./waveglow/logging.py:144:38: F821 undefined name '_GetFileAndLine'
    count = _GetNextLogCountPerToken(_GetFileAndLine())
                                     ^
./waveglow/logging.py:157:38: F821 undefined name '_GetFileAndLine'
    count = _GetNextLogCountPerToken(_GetFileAndLine())
                                     ^
2     F821 undefined name '_GetFileAndLine'
2

opened by cclauss 1

Early Output implementation is same as the paper?

Hi,

It's a nice work! I found that you output half of current channels for early output. But in the paper, they output constant channels for early output. In their case, output 2 channels. Am I wrong? Or there are some tricks?

opened by dhgrs 1

negative loss value

i'm using your code to train my dataset,the codes ran ok while loss value is negative ,is this normal?

INFO:pytorch:Let's use 1 GPUs!
C:\Python35\lib\site-packages\torch\nn\modules\upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
INFO:pytorch:[1,   1] loss: 8774.103
INFO:pytorch:[1,   2] loss: 247.321
INFO:pytorch:[1,   3] loss: 2267.624
INFO:pytorch:[1,   4] loss: 2223.914
INFO:pytorch:[1,   5] loss: 21230.785
INFO:pytorch:[1,   6] loss: -4516.562
INFO:pytorch:[1,   7] loss: -1636.356
INFO:pytorch:[1,   8] loss: -1802.424
INFO:pytorch:[1,   9] loss: -973.106
INFO:pytorch:[1,  10] loss: -2390.099
INFO:pytorch:[1,  11] loss: -3500.059
INFO:pytorch:[1,  12] loss: -2850.755
INFO:pytorch:[1,  13] loss: -4785.271
INFO:pytorch:[1,  14] loss: -5666.863
INFO:pytorch:[1,  15] loss: -6398.563
INFO:pytorch:[1,  16] loss: 6809.037
INFO:pytorch:[1,  17] loss: 6204.096
INFO:pytorch:[1,  18] loss: -9287.712
INFO:pytorch:[1,  19] loss: -3614.344

opened by shartoo 3

Owner

Yuchao Zhang

speech synthesis/machine learning

GitHub

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese mandarin text to speech based on Fastspeech2 and Unet This is a modification and adpation of fastspeech2 to mandrin(普通话）. Many modifications t

291 Jan 2, 2023

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

585 Jan 4, 2023

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

31 Dec 8, 2022

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

43 Nov 27, 2022

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Related tags

Overview

WaveGlow

Quick Start:

Notes:

You might also like...

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

PyTorch implementation of Tacotron speech synthesis model.

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Comments

How to choose upsampling rte for conditioning?

Undefined name: Where is _GetFileAndLine() defined?

Early Output implementation is same as the paper?

negative loss value

Owner

Yuchao Zhang

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Just Go with the Flow: Self-Supervised Scene Flow Estimation

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Related tags

Overview

WaveGlow

Quick Start:

Notes:

You might also like...

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

PyTorch implementation of Tacotron speech synthesis model.

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Comments

How to choose upsampling rte for conditioning?

Undefined name: Where is _GetFileAndLine() defined?

Early Output implementation is same as the paper?

negative loss value

Owner

Yuchao Zhang

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Just Go with the Flow: Self-Supervised Scene Flow Estimation

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,