Pytorch implementation of

Liu Songxiang

Last update: Nov 16, 2022

Related tags

Deep Learning efficient_tts

Overview

EfficientTTS

Unofficial Pytorch implementation of "EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture"(arXiv).

Disclaimer: Somebody mistakenly think I'm one of the authors. In fact, I am not even in the author list of this paper. I am just a TTS enthusiast. Some important information of the implementation is not presented by the paper. Some model parameters in current version is based on my understanding and exepriments, which may not be consistent with those used by the authors.

Updates

2020/12/23: Mandarin Chinese Samples uploaded. The experiment setting is exactly the same with the LJSpeech example. A complete description of the usage will be soon uploaded.

2020/12/20: Using the HifiGAN finetuned with Tacotron2 GTA mel spectrograms can increase the quality of the generated samples, please see the newly generated-samples

Current status

Implementation of EFTS-CNN + HifiGAN

Setup with virtualenv

$ cd tools
$ make
# If you want to use distributed training, please run following
# command to install apex.
$ make apex

Note: If you want to specify Python version, CUDA version or PyTorch version, please run for example:

$ make PYTHON=3.7 CUDA_VERSION=10.1 PYTORCH_VERSION=1.6

Training

Please go to egs/lj folder, and see run.sh for example use.

Acknowledgement

The code framework is from https://github.com/kan-bayashi/ParallelWaveGAN

Comments

High eval mel loss when training on Mandarin datasets

Hi. Thank you for your implementation. I trained the model on some Mandarin datasets (12000/train_set & 100/eval_set) for about 695k steps. The train/mel loss is about 0.12 and train/dur_loss is about 0.0158. The eval/dur_loss is about 0.07. However, the eval/mel loss is high (~0.84). Besides, I also notice that the model sometimes fails to synthesize reduplicated words (e.g. 嗯嗯、叽叽喳喳）and tone-5 words（轻读 e.g. 哎呀）

If you don't mind, could you tell me how did you process the DataBaker datasets, what does your input text look like, and how to solve the problems mentioned above? Thank you very much for your help

opened by Charlottecuc 2
questions about code

I wonder if the code in https://github.com/liusongxiang/efficient_tts/blob/d186a56bf87e2c688158179f0f41b981718aebdb/nntts/models/efficient_tts.py#L338 is correct？It seems two tensors with different size make subtraction，[B,T2,1] and [B,T1,1]

opened by attitudechunfeng 2
synthesizing with HiFi-GAN

Hi. Did you try fine-tuing HiFi-GAN(https://github.com/jik876/hifi-gan) which can alleviate the metal noise and greatly improve the quality of synthesized voice? (Generated mels using efficientTTS with teacher-forcing)

opened by Charlottecuc 1
Inference speed

Hello everyone! Great job! I see that is still has metal sound, but my question is about an inference speed. How does it compare to Tacotron 2? Is it much faster, as the paper says? Could you please tell approximate real time ratio on CPU (and CPU model)? Thank you a lot!

opened by dmazurok 1
Metal noise in sample

Thanks for your great work. I have already heard some samples at link. There are some metal noises at high pitch and some words are mispronounced. Does longer training overcome this problem? or is it caused by vocoder? I wonder how quality audio of efficient TTS + another vocoder(Melgan)?

opened by l4zyf9x 1
pseudo code in the paper is a little different from the equations of the paper.

I find the pseudo code in the paper is a little different from the equations of the paper(eg: eq.14 and eq.17). Wondering how the differences affect the results.

opened by attitudechunfeng 0
Can I finetune one people's voice (my own voice) with this model ?And how ?

Hi, can you tell me how to finetune one people's voice ,so I can get a specific people's speech？can you tell me the steps ? And how much( or how long ) the duration of this one-people's funetune data should be ?

opened by Tian14267 0
Question about optimizer

Hi. Thank you for your implementation, and I have a question about the optimizer. It seems that you use Adam optimizer with lr=1e-3 and amsgrad=True.

Why you choose the options especially the learning rate, even though the original paper says that they train their model with lr=1e-4.

Did it fail to train your model with lr=1e-3 or amsgrad=False?

opened by LEEYOONHYUNG 0
Reproducing good results (as claimed in paper)

Somewhat related to issue #2 which was closed, but I think it's safe to say that the latest samples posted do not seem to be close to converging towards the strong results that were claimed by the paper's authors, and it would be good to have an issue tracking speech quality.

It's somewhat puzzling given that the implementation seems to be on point except for the missing hyperparameter sigma values that you mentioned. I'm doing my own experiments playing with hyperparameters but haven't been able so far to achieve something too competitive. If you have any ideas of what could be tried, let me know.

opened by ctlaltdefeat 8

Owner

Liu Songxiang

Spoken language processing

GitHub

RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

90 Dec 8, 2022

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

121 Dec 17, 2022

A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

172 Dec 12, 2022

PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

111 Dec 8, 2022

Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

76 Jan 7, 2023

RetinaNet-PyTorch - A RetinaNet Pytorch Implementation on remote sensing images and has the similar mAP result with RetinaNet in MMdetection

?? RetinaNet Horizontal Detector Based PyTorch This is a horizontal detector Ret

13 Nov 19, 2022

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

556 Jan 4, 2023

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

616 Jan 6, 2023

Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

520 Dec 30, 2022

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

6.9k Jan 3, 2023

Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

119 Nov 24, 2022

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

1.4k Jan 1, 2023

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

360 Dec 10, 2022

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

9.2k Jan 2, 2023

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

359 Jan 5, 2023

A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch Deep Learning Models using the C++ frontend Gettting started Clone the repo 1. https://github.com/mrdvince/pytorchcpp 2. cd fashionmnist or

0 Jul 13, 2021

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

8 Nov 21, 2022

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

157 Dec 11, 2022

A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

6 Mar 17, 2022