A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Overview

WaveGlow

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Quick Start:

  1. Install requirements:
pip install -r requirements.txt
  1. Download dataset:
wget http://festvox.org/cmu_arctic/cmu_arctic/packed/cmu_us_slt_arctic-0.95-release.tar.bz2
tar xf cmu_us_slt_arctic-0.95-release.tar.bz2
  1. Extract features: feature extracting pipeline is the same as tacotron

  2. Training with default hyperparams:

python train.py
  1. Synthesize from model:
python generate.py --checkpoint=/path/to/model --local_condition_file=/path/to/local_conditon

Notes:

  • This is not official implementation, some details are not necessarily correct.
  • Work in progress.
You might also like...
Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

StackGAN-v2 StackGAN-v1: Tensorflow implementation StackGAN-v1: Pytorch implementation Inception score evaluation Pytorch implementation for reproduci

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

WaveGrad2 - PyTorch Implementation PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Status (202

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Daft-Exprt - PyTorch Implementation PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis The

PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)
PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

Lip to Speech Synthesis with Visual Context Attentional GAN This repository contains the PyTorch implementation of the following paper: Lip to Speech

[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

NeRFlow [ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing Datasets The pouring dataset used for experiments can be download he

Minimal PyTorch implementation of Generative Latent Optimization from the paper
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech
PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

PortaSpeech - PyTorch Implementation PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech. Model Size Module Nor

Comments
  • How to choose upsampling rte for conditioning?

    How to choose upsampling rte for conditioning?

    @npuichigo

    I have local conditioning of following shape 1 X 1000 X 50 (for 1 sec) with input speech of 16k sampling rate. Can you recommend, which up sampling factor I should choose ? and How to estimate up sampling factor ? Thanks

    opened by ajinkyakulkarni14 5
  • Undefined name: Where is _GetFileAndLine() defined?

    Undefined name: Where is _GetFileAndLine() defined?

    https://github.com/npuichigo/waveglow/search?q=_GetFileAndLine

    flake8 testing of https://github.com/npuichigo/waveglow on Python 3.7.1

    $ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

    ./waveglow/logging.py:144:38: F821 undefined name '_GetFileAndLine'
        count = _GetNextLogCountPerToken(_GetFileAndLine())
                                         ^
    ./waveglow/logging.py:157:38: F821 undefined name '_GetFileAndLine'
        count = _GetNextLogCountPerToken(_GetFileAndLine())
                                         ^
    2     F821 undefined name '_GetFileAndLine'
    2
    
    opened by cclauss 1
  • Early Output implementation is same as the paper?

    Early Output implementation is same as the paper?

    Hi,

    It's a nice work! I found that you output half of current channels for early output. But in the paper, they output constant channels for early output. In their case, output 2 channels. Am I wrong? Or there are some tricks?

    opened by dhgrs 1
  • negative loss value

    negative loss value

    i'm using your code to train my dataset,the codes ran ok while loss value is negative ,is this normal?

    INFO:pytorch:Let's use 1 GPUs!
    C:\Python35\lib\site-packages\torch\nn\modules\upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.
      warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
    INFO:pytorch:[1,   1] loss: 8774.103
    INFO:pytorch:[1,   2] loss: 247.321
    INFO:pytorch:[1,   3] loss: 2267.624
    INFO:pytorch:[1,   4] loss: 2223.914
    INFO:pytorch:[1,   5] loss: 21230.785
    INFO:pytorch:[1,   6] loss: -4516.562
    INFO:pytorch:[1,   7] loss: -1636.356
    INFO:pytorch:[1,   8] loss: -1802.424
    INFO:pytorch:[1,   9] loss: -973.106
    INFO:pytorch:[1,  10] loss: -2390.099
    INFO:pytorch:[1,  11] loss: -3500.059
    INFO:pytorch:[1,  12] loss: -2850.755
    INFO:pytorch:[1,  13] loss: -4785.271
    INFO:pytorch:[1,  14] loss: -5666.863
    INFO:pytorch:[1,  15] loss: -6398.563
    INFO:pytorch:[1,  16] loss: 6809.037
    INFO:pytorch:[1,  17] loss: 6204.096
    INFO:pytorch:[1,  18] loss: -9287.712
    INFO:pytorch:[1,  19] loss: -3614.344
    
    opened by shartoo 3
Owner
Yuchao Zhang
speech synthesis/machine learning
Yuchao Zhang
Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese mandarin text to speech based on Fastspeech2 and Unet This is a modification and adpation of fastspeech2 to mandrin(普通话). Many modifications t

null 291 Jan 2, 2023
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Rishikesh (ऋषिकेश) 31 Dec 8, 2022
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

Ajinkya Kulkarni 43 Nov 27, 2022
PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

FastPitchFormant - PyTorch Implementation PyTorch Implementation of FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis. Qu

Keon Lee 63 Jan 2, 2023
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Multi-speaker DGP This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch. O

sarulab-speech 24 Sep 7, 2022
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.8k Jan 8, 2023
Just Go with the Flow: Self-Supervised Scene Flow Estimation

Just Go with the Flow: Self-Supervised Scene Flow Estimation Code release for the paper Just Go with the Flow: Self-Supervised Scene Flow Estimation,

Himangi Mittal 50 Nov 22, 2022
Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

ASEGAN: Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder 中文版简介 Readme with English Version 介绍 基于SEGAN模型的改进版本,使用自主设计的非

Nitin 53 Nov 17, 2022