A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Overview

WaveGlow

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Quick Start:

  1. Install requirements:
pip install -r requirements.txt
  1. Download dataset:
wget http://festvox.org/cmu_arctic/cmu_arctic/packed/cmu_us_slt_arctic-0.95-release.tar.bz2
tar xf cmu_us_slt_arctic-0.95-release.tar.bz2
  1. Extract features: feature extracting pipeline is the same as tacotron

  2. Training with default hyperparams:

python train.py
  1. Synthesize from model:
python generate.py --checkpoint=/path/to/model --local_condition_file=/path/to/local_conditon

Notes:

  • This is not official implementation, some details are not necessarily correct.
  • Work in progress.
You might also like...
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

End-2-end speech synthesis with recurrent neural networks
End-2-end speech synthesis with recurrent neural networks

Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

A Python module made to simplify the usage of Text To Speech and Speech Recognition.
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognition, speech enhancement, multi-microphone signal processing and many others.

Comments
  • How to choose upsampling rte for conditioning?

    How to choose upsampling rte for conditioning?

    @npuichigo

    I have local conditioning of following shape 1 X 1000 X 50 (for 1 sec) with input speech of 16k sampling rate. Can you recommend, which up sampling factor I should choose ? and How to estimate up sampling factor ? Thanks

    opened by ajinkyakulkarni14 5
  • Undefined name: Where is _GetFileAndLine() defined?

    Undefined name: Where is _GetFileAndLine() defined?

    https://github.com/npuichigo/waveglow/search?q=_GetFileAndLine

    flake8 testing of https://github.com/npuichigo/waveglow on Python 3.7.1

    $ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

    ./waveglow/logging.py:144:38: F821 undefined name '_GetFileAndLine'
        count = _GetNextLogCountPerToken(_GetFileAndLine())
                                         ^
    ./waveglow/logging.py:157:38: F821 undefined name '_GetFileAndLine'
        count = _GetNextLogCountPerToken(_GetFileAndLine())
                                         ^
    2     F821 undefined name '_GetFileAndLine'
    2
    
    opened by cclauss 1
  • Early Output implementation is same as the paper?

    Early Output implementation is same as the paper?

    Hi,

    It's a nice work! I found that you output half of current channels for early output. But in the paper, they output constant channels for early output. In their case, output 2 channels. Am I wrong? Or there are some tricks?

    opened by dhgrs 1
  • negative loss value

    negative loss value

    i'm using your code to train my dataset,the codes ran ok while loss value is negative ,is this normal?

    INFO:pytorch:Let's use 1 GPUs!
    C:\Python35\lib\site-packages\torch\nn\modules\upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.
      warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
    INFO:pytorch:[1,   1] loss: 8774.103
    INFO:pytorch:[1,   2] loss: 247.321
    INFO:pytorch:[1,   3] loss: 2267.624
    INFO:pytorch:[1,   4] loss: 2223.914
    INFO:pytorch:[1,   5] loss: 21230.785
    INFO:pytorch:[1,   6] loss: -4516.562
    INFO:pytorch:[1,   7] loss: -1636.356
    INFO:pytorch:[1,   8] loss: -1802.424
    INFO:pytorch:[1,   9] loss: -973.106
    INFO:pytorch:[1,  10] loss: -2390.099
    INFO:pytorch:[1,  11] loss: -3500.059
    INFO:pytorch:[1,  12] loss: -2850.755
    INFO:pytorch:[1,  13] loss: -4785.271
    INFO:pytorch:[1,  14] loss: -5666.863
    INFO:pytorch:[1,  15] loss: -6398.563
    INFO:pytorch:[1,  16] loss: 6809.037
    INFO:pytorch:[1,  17] loss: 6204.096
    INFO:pytorch:[1,  18] loss: -9287.712
    INFO:pytorch:[1,  19] loss: -3614.344
    
    opened by shartoo 3
Owner
Yuchao Zhang
speech synthesis/machine learning
Yuchao Zhang
Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Ankur Dhuriya 10 Oct 13, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 1, 2022
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.8k Dec 30, 2022
PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

Ryuichi Yamamoto 279 Dec 9, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

Facebook Research 135 Dec 18, 2022
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented. Mostly I would recommend giving a quick look to the figures beyond the introduction.

Corentin Jemine 38.5k Jan 3, 2023