AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Related tags

Deep Learning AdaSpeech2

Overview

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP]

Unofficial Pytorch implementation of AdaSpeech 2.

Requirements :

All code written in Python 3.6.2 .

Install Pytorch

Before installing pytorch please check your Cuda version by running following command : nvcc --version

pip install torch torchvision

In this repo I have used Pytorch 1.6.0 for torch.bucketize feature which is not present in previous versions of PyTorch.

Installing other requirements :

pip install -r requirements.txt

To use Tensorboard install tensorboard version 1.14.0 seperatly with supported tensorflow (1.14.0)

For Preprocessing :

filelists folder contains MFA (Motreal Force aligner) processed LJSpeech dataset files so you don't need to align text with audio (for extract duration) for LJSpeech dataset. For other dataset follow instruction here. For other pre-processing run following command :

python nvidia_preprocessing.py -d path_of_wavs

For finding the min and max of F0 and Energy

python compute_statistics.py

Update the following in hparams.py by min and max of F0 and Energy

p_min = Min F0/pitch
p_max = Max F0
e_min = Min energy
e_max = Max energy

Training :

[WIP]

Citations :

@misc{chen2021adaspeech,
      title={AdaSpeech: Adaptive Text to Speech for Custom Voice}, 
      author={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and Sheng Zhao and Tie-Yan Liu},
      year={2021},
      eprint={2103.00993},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

@misc{yan2021adaspeech,
      title={AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data}, 
      author={Yuzi Yan and Xu Tan and Bohan Li and Tao Qin and Sheng Zhao and Yuan Shen and Tie-Yan Liu},
      year={2021},
      eprint={2104.09715},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

You might also like...

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

1.7k Jan 8, 2023

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

WaveGrad2 - PyTorch Implementation PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Status (202

59 Dec 6, 2022

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

103 Dec 23, 2022

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

67 Nov 14, 2022

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

1.8k Jan 8, 2023

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

PortaSpeech - PyTorch Implementation PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech. Model Size Module Nor

279 Jan 4, 2023

Combine Tacotron2 and Hifi GAN to generate speech from text

EndToEndTextToSpeech Combine Tacotron2 and Hifi GAN to generate speech from text Download weights Hifi GAN - hifi_gan/checkpoint/ : pretrain 2.5M ste

1 Dec 18, 2021

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

157 Jan 1, 2023

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

🐤 Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

156 Jan 9, 2023

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Related tags

Overview

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP]

Requirements :

For Preprocessing :

Training :

Citations :

You might also like...

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Combine Tacotron2 and Hifi GAN to generate speech from text

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

Owner

Rishikesh (ऋषिकेश)

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

African language Speech Recognition - Speech-to-Text

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

PyTorch implementation of ARM-Net: Adaptive Relation Modeling Network for Structured Data.

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Related tags

Overview

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP]

Requirements :

For Preprocessing :

Training :

Citations :

You might also like...

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Combine Tacotron2 and Hifi GAN to generate speech from text

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

Owner

Rishikesh (ऋषिकेश)

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

African language Speech Recognition - Speech-to-Text

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

PyTorch implementation of ARM-Net: Adaptive Relation Modeling Network for Structured Data.

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,