The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Overview

VAENAR-TTS

This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis".

Samples | Paper | Pretrained Models

Usage

0. Dataset

  1. English: LJSpeech
  2. Mandarin: DataBaker(标贝)

1. Environment setup

conda env create -f environment.yml
conda activate vaenartts-env

2. Data pre-processing

For English using LJSpeech:

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset ljspeech --data_dir /path/to/extracted/LJSpeech-1.1 --save_dir ./ljspeech

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset databaker --data_dir /path/to/extracted/biaobei --save_dir ./databaker

3. Training

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset ljspeech --log_dir ./lj-log_dir --test_dir ./lj-test_dir --data_dir ./ljspeech/tfrecords/ --model_dir ./lj-model_dir

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset databaker --log_dir ./db-log_dir --test_dir ./db-test_dir --data_dir ./databaker/tfrecords/ --model_dir ./db-model_dir

4. Inference (synthesize speech for the whole test set)

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset ljspeech --test_dir ./lj-test-2000 --data_dir ./ljspeech/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./lj-model_dir/ckpt-2000

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./databaker/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./db-model_dir/ckpt-2000

Reference

  1. XuezheMax/flowseq
  2. keithito/tacotron
Comments
  • Can I use a GAN-based network to replace the flow-based prior P(Z|X)?

    Can I use a GAN-based network to replace the flow-based prior P(Z|X)?

    If I understand this paper and FlowSeq correctly, the normalizing flow is used to model the dependence of text X (from the posterior P(Z|X, Y)). As GAN can also model the distribution, can I use a GAN-based network to replace the flow-based prior P(Z|X)?

    opened by seekerzz 4
  • Different result

    Different result

    Hi thank you for your awesome work. I tried to synthesize using inference script and checkpoint u provided the result sound robotic, why is that ? do I missing something ?

    opened by kikirizki 2
  • 关于跑代码时候发生 Type mismatch 的异常

    关于跑代码时候发生 Type mismatch 的异常

    在运行 preprocess.py 这步时出现异常

    Traceback (most recent call last): File "train.py", line 329, in main() File "train.py", line 257, in main for fids, texts, mels, t_lengths, m_lengths in train_set.take(1): File "/home/wuyx/miniconda3/envs/vc/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 761, in next return self._next_internal() File "/home/wuyx/miniconda3/envs/vc/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 747, in _next_internal output_shapes=self._flat_output_shapes) File "/home/wuyx/miniconda3/envs/vc/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2728, in iterator_get_next _ops.raise_from_not_ok_status(e, name) File "/home/wuyx/miniconda3/envs/vc/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6897, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Type mismatch between parsed tensor (float) and dtype (double) [[{{node ParseTensor_1}}]] [Op:IteratorGetNext]

    opened by wuyx517 2
  • about log probability?

    about log probability?

    in theposterior.py code: time_level_log_probs = -0.5 * (tf.cast(dim, tf.float32) * tf.math.log(2 * np.pi)+ tf.reduce_sum(expanded_logvar + normalized_samples ** 2., axis=3))

    but the log_prob of gaussian: log_probs = log(1.0 / (sqrt(2.0 * pi) * std) * exp(-0.5 * (x-u) ** 2 / std ** 2)) = -0.5 * (log(2.0 * pi) + 2.0 * log(std) + (x-u) ** 2 / std ** 2)

    so you miss a constant value 2.0 of expanded_logvar although it doesn't matter? time_level_log_probs = -0.5 * (tf.cast(dim, tf.float32) * tf.math.log(2 * np.pi)+ tf.reduce_sum(2.0 * expanded_logvar + normalized_samples ** 2., axis=3))

    opened by BridgetteSong 1
  • synthesized wavs of long texts

    synthesized wavs of long texts

    I downloaded the pretrained model of databaker and synthesized wavs using inference.py. The results are not very good, I mean the alignment is not right especially when the input text is long. For example, "失恋的人特别喜欢往人烟罕至的角落里钻。", the synthesized wavs sounds like: 失恋的人特别喜欢往人烟罕至的_角角落里钻钻钻钻_

    For longer input text,the synthesized wavs are totally wrong

    opened by Liujingxiu23 2
  • The config of hifigan used when generate samples

    The config of hifigan used when generate samples

    Hi, I want to know what config does you use when you train the hifigan model of DataBaker to get the samples in the webset https://light1726.github.io/vaenar-tts/.
    With these parameters clearified, we can better compare the quality of the synthsized wavs with other SOTA acoustic models.

    I mean the following three parameter in config file. "upsample_rates":
    "upsample_kernel_sizes":
    "upsample_initial_channel":

    Thank you!

    opened by Liujingxiu23 14
Owner
THUHCSI
Human-Computer Speech Interaction Lab at Tsinghua University
THUHCSI
This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).

Non-autoregressive Deep Learning-Based TTS Template This is a template for the Non-autoregressive TTS model. It contains Data Preprocessing Pipeline D

Keon Lee 13 Dec 5, 2022
Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Parallel Tacotron2 Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Keon Lee 170 Dec 27, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

null 10 Oct 7, 2022
Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Protein GLM (wip) Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capabil

Phil Wang 17 May 6, 2022
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TalkNet 2 [WIP] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Predictio

Rishikesh (ऋषिकेश) 69 Dec 17, 2022
SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling Reference Main paper to be cited (Di Wu et al., 2020) @article

Moore 34 Nov 3, 2022
Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

Chen Kai 66 Nov 28, 2022
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022
Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

GLAT Implementation for the ACL2021 paper "Glancing Transformer for Non-Autoregressive Neural Machine Translation" Requirements Python >= 3.7 Pytorch

null 117 Jan 9, 2023
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

?? Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

Rendi Chevi 156 Jan 9, 2023
Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

Andrew Luo 41 Dec 9, 2022
Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Chunked Autoregressive GAN (CARGAN) Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis [paper] [compan

Descript 150 Dec 6, 2022
PyTorch package for the discrete VAE used for DALL·E.

Overview [Blog] [Paper] [Model Card] [Usage] This is the official PyTorch package for the discrete VAE used for DALL·E. Installation Before running th

OpenAI 9.5k Jan 5, 2023
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Diverse Structure Inpainting ArXiv | Papar | Supplementary Material | BibTex This repository is for the CVPR 2021 paper, "Generating Diverse Structure

null 152 Nov 4, 2022
VideoGPT: Video Generation using VQ-VAE and Transformers

VideoGPT: Video Generation using VQ-VAE and Transformers [Paper][Website][Colab][Gradio Demo] We present VideoGPT: a conceptually simple architecture

Wilson Yan 470 Dec 30, 2022
Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Generative Models Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow. Also present here are RBM and Helmholtz Machine. Note: Gen

Agustinus Kristiadi 7k Jan 2, 2023
Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

Shayne O'Brien 471 Dec 16, 2022