🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

Vega

Last update: Dec 31, 2022

Related tags

Overview

English | 中文

Features

🌍 Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc.

🤩 PyTorch worked for pytorch, tested in version of 1.9.0(latest in August 2021), with GPU Tesla T4 and GTX 2060

🌍 Windows + Linux run in both Windows OS and linux OS (even in M1 MACOS)

🤩 Easy & Awesome effect with only newly-trained synthesizer, by reusing the pretrained encoder/vocoder

🌍 Webserver Ready to serve your result with remote calling

DEMO VIDEO

Quick Start

1. Install Requirements

Follow the original repo to test if you got all environment ready. **Python 3.7 or higher ** is needed to run the toolbox.

Install PyTorch.

If you get an ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu102 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2 ) This error is probably due to a low version of python, try using 3.9 and it will install successfully

Install ffmpeg.
Run pip install -r requirements.txt to install the remaining necessary packages.
Install webrtcvad pip install webrtcvad-wheels(If you need)

Note that we are using the pretrained encoder/vocoder but synthesizer, since the original model is incompatible with the Chinese sympols. It means the demo_cli is not working at this moment.

2. Prepare your models

You can either train your models or use existing ones:

2.1 Train encoder with your dataset (Optional)

Preprocess with the audios and the mel spectrograms: python encoder_preprocess.py Allowing parameter --dataset {dataset} to support the datasets you want to preprocess. Only the train set of these datasets will be used. Possible names: librispeech_other, voxceleb1, voxceleb2. Use comma to sperate multiple datasets.
Train the encoder: python encoder_train.py my_run /SV2TTS/encoder

For training, the encoder uses visdom. You can disable it with --no_visdom, but it's nice to have. Run "visdom" in a separate CLI/process to start your visdom server.

2.2 Train synthesizer with your dataset

Download dataset and unzip: make sure you can access all .wav in folder
Preprocess with the audios and the mel spectrograms: python pre.py Allowing parameter --dataset {dataset} to support aidatatang_200zh, magicdata, aishell3, data_aishell, etc.If this parameter is not passed, the default dataset will be aidatatang_200zh.
Train the synthesizer: python synthesizer_train.py mandarin /SV2TTS/synthesizer
Go to next step when you see attention line show and loss meet your need in training folder synthesizer/saved_models/.

2.3 Use pretrained model of synthesizer

Thanks to the community, some models will be shared:

author	Download link	Preview Video	Info
@author	https://pan.baidu.com/s/1iONvRxmkI-t1nHqxKytY3g Baidu 4j5d		75k steps trained by multiple datasets
@author	https://pan.baidu.com/s/1fMh9IlgKJlL2PIiRTYDUvw Baidu code：om7f		25k steps trained by multiple datasets, only works under version 0.0.1
@FawenYo	https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing https://u.teknik.io/AYxWf.pt	input output	200k steps with local accent of Taiwan, only works under version 0.0.1
@miven	https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code：2021	https://www.bilibili.com/video/BV1uh411B7AD/	only works under version 0.0.1

2.4 Train vocoder (Optional)

note: vocoder has little difference in effect, so you may not need to train a new one.

Preprocess the data: python vocoder_preprocess.py -m

replace with your dataset root，replace with directory of your best trained models of sythensizer, e.g. sythensizer\saved_mode\xxx

Train the wavernn vocoder: python vocoder_train.py mandarin
Train the hifigan vocoder python vocoder_train.py mandarin hifigan

3. Launch

3.1 Using the web server

You can then try to run:python web.py and open it in browser, default as http://localhost:8080

3.2 Using the Toolbox

You can then try the toolbox: python demo_toolbox.py -d

Reference

This repository is forked from Real-Time-Voice-Cloning which only support English.

URL	Designation	Title	Implementation source
1803.09017	GlobalStyleToken (synthesizer)	Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis	This repo
2010.05646	HiFi-GAN (vocoder)	Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis	This repo
1806.04558	SV2TTS	Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis	This repo
1802.08435	WaveRNN (vocoder)	Efficient Neural Audio Synthesis	fatchord/WaveRNN
1703.10135	Tacotron (synthesizer)	Tacotron: Towards End-to-End Speech Synthesis	fatchord/WaveRNN
1710.10467	GE2E (encoder)	Generalized End-To-End Loss for Speaker Verification	This repo

F Q&A

1.Where can I download the dataset?

Dataset	Original Source	Alternative Sources
aidatatang_200zh	OpenSLR	Google Drive
magicdata	OpenSLR	Google Drive (Dev set)
aishell3	OpenSLR	Google Drive
data_aishell	OpenSLR

After unzip aidatatang_200zh, you need to unzip all the files under aidatatang_200zh\corpus\train

2.What is?

If the dataset path is D:\data\aidatatang_200zh,then isD:\data

3.Not enough VRAM

Train the synthesizer：adjust the batch_size in synthesizer/hparams.py

//Before
tts_schedule = [(2,  1e-3,  20_000,  12),   # Progressive training schedule
                (2,  5e-4,  40_000,  12),   # (r, lr, step, batch_size)
                (2,  2e-4,  80_000,  12),   #
                (2,  1e-4, 160_000,  12),   # r = reduction factor (# of mel frames
                (2,  3e-5, 320_000,  12),   #     synthesized for each decoder iteration)
                (2,  1e-5, 640_000,  12)],  # lr = learning rate
//After
tts_schedule = [(2,  1e-3,  20_000,  8),   # Progressive training schedule
                (2,  5e-4,  40_000,  8),   # (r, lr, step, batch_size)
                (2,  2e-4,  80_000,  8),   #
                (2,  1e-4, 160_000,  8),   # r = reduction factor (# of mel frames
                (2,  3e-5, 320_000,  8),   #     synthesized for each decoder iteration)
                (2,  1e-5, 640_000,  8)],  # lr = learning rate

Train Vocoder-Preprocess the data：adjust the batch_size in synthesizer/hparams.py

//Before
### Data Preprocessing
        max_mel_frames = 900,
        rescale = True,
        rescaling_max = 0.9,
        synthesis_batch_size = 16,                  # For vocoder preprocessing and inference.
//After
### Data Preprocessing
        max_mel_frames = 900,
        rescale = True,
        rescaling_max = 0.9,
        synthesis_batch_size = 8,                  # For vocoder preprocessing and inference.

Train Vocoder-Train the vocoder：adjust the batch_size in vocoder/wavernn/hparams.py

//Before
# Training
voc_batch_size = 100
voc_lr = 1e-4
voc_gen_at_checkpoint = 5
voc_pad = 2

//After
# Training
voc_batch_size = 6
voc_lr = 1e-4
voc_gen_at_checkpoint = 5
voc_pad =2

4.If it happens `RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).`

Please refer to issue #37

5. How to improve CPU and GPU occupancy rate?

Adjust the batch_size as appropriate to improve

6. What if it happens `the page file is too small to complete the operation`

Please refer to this video and change the virtual memory to 100G (102400), for example : When the file is placed in the D disk, the virtual memory of the D disk is changed.

7. When should I stop during training?

FYI, my attention came after 18k steps and loss became lower than 0.4 after 50k steps.

Comments

求助执行requirements.txt时报No module named 'pyworld'是什么问题？？

已是最新代码 E:\MockingBird\MockingBird>python demo_toolbox.py Traceback (most recent call last): File "E:\MockingBird\MockingBird\demo_toolbox.py", line 2, in from toolbox import Toolbox File "E:\MockingBird\MockingBird\toolbox_init_.py", line 9, in from utils.f0_utils import compute_f0, f02lf0, compute_mean_std, get_converted_lf0uv File "E:\MockingBird\MockingBird\utils\f0_utils.py", line 3, in import pyworld ModuleNotFoundError: No module named 'pyworld'
bug

opened by bricklayers 14
训练合成器时无法收敛

问题简述 使用自己的数据集训练合成器模型的时候的时候，在预处理之后训练合成器并将合成器替换成既有model后产生的图并没有收敛。

复现与环境

参照www.bilibili.com/video/BV1dq4y137pH 进行的复现。代码版本为main branch，首先进行数据预处理之后参考视频里的首先进行合成器训练，然后用pretrained-11-7-21 替换掉当前mode 继续进行训练。发现图并没有收敛。截图

opened by akiaki1996 13
關於 Train synthesizer 的問題，求指導 !

你好我已經下載了aidatatang_200zh這個數據集，並且把 aidatatang_200zh\corpus\train 底下的檔案都解壓縮完畢但是當我要開始執行 python synthesizer_preprocess_audio.py D:\google download(我把檔案放在 D:\google download 這個路徑下 ) 卻發生以下狀況: D:\python_demo\Realtime-Voice-Clone-Chinese>python synthesizer_preprocess_audio.py D:\google download\ D:\python_demo\Realtime-Voice-Clone-Chinese\encoder\audio.py:13: UserWarning: Unable to import 'webrtcvad'. This package enables noise removal and is recommended. warn("Unable to import 'webrtcvad'. This package enables noise removal and is recommended.") usage: synthesizer_preprocess_audio.py [-h] [-o OUT_DIR] [-n N_PROCESSES] [-s] [--hparams HPARAMS] [--no_trim] [--no_alignments] [--dataset DATASET] datasets_root synthesizer_preprocess_audio.py: error: unrecognized arguments: download\

請問我可以怎麼解決問題呢? 我有查看之前 issues 的討論並沒有發現有類似問題，以下是我想到可能有問題的地方，還請作者為我解答，謝謝！

1.我只有解壓縮 aidatatang_200zh\corpus\train 底下的檔案，是否其他資料夾下的檔案也要解壓縮? 2.是不是只需要將所有 wav 檔單獨拉出來放在 aidatatang_200zh\corpus\train 底下然後再執行python synthesizer_preprocess_audio.py D:\google download ? 3. 輸入的指令不對 4. wav 檔與 txt 檔是不是要預先處理，而我沒有進行處理?

opened by XiuChen-Liu 13
用社区分享的模型训练报错不知道原因

用社区分享的模型训练报错不知道原因而且不知道咋保存模型是不是必须要每500步才会自动保存求各位大佬解惑感谢！ RuntimeError: The size of tensor a (1024) must match the size of tensor b (3) at non-singleton dimension 3

opened by johnwestin 12
训练模型时显存爆了

Variable._execution_engine.run_backward(RuntimeError: CUDA out of memory. Tried to allocate 88.00 MiB (GPU 0; 4.00 GiB totalcapacity; 2.68 GiB already allocated; 0 bytes free; 2.85 GiB reserved in total by PyTorch)

能不能提供一个调batch_size的参数? 我目前用的显卡显存只有4G(GTX1050Ti)，默认参数正常训练时经常爆掉显存....

opened by cronfox 11
如何解决运行python synthesizer_preprocess_audio.py时报错 DLL load failed:页面文件太小，无法完成操作

我在运行 python synthesizer_preprocess_audio.py时遇到如上错误，在CSDN上找到解决方法：1.如果python 运行环境不在C盘查看高级系统设置->高级->性能设置->高级->虚拟内存->更改 ->取消自动管理所有驱动器的分页文件大小-> 自定义大小 ->初始大小和最大值设为10240 2. 更改DateLoade 中的参数num_worker 改为0 但我现在不清楚具体怎样把参数设为0

opened by 9527-567 11
capturable=False,报错

Win11 GPU：3060laptop
Python 3.9.13

+----------------+------------+---------------+------------------+ | Steps with r=2 | Batch Size | Learning Rate | Outputs/Step (r) | +----------------+------------+---------------+------------------+ | 101k Steps | 16 | 3e-06 | 2 | +----------------+------------+---------------+------------------+

Could not load symbol cublasGetSmCountTarget from cublas64_11.dll. Error code 127 Traceback (most recent call last): File "G:\AIvioce\MockingBird\synthesizer_train.py", line 37, in train(**vars(args)) File "G:\AIvioce\MockingBird\synthesizer\train.py", line 216, in train optimizer.step() File "C:\Users\Mark\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\optim\optimizer.py", line 109, in wrapper return func(*args, **kwargs) File "C:\Users\Mark\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "C:\Users\Mark\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\optim\adam.py", line 157, in step adam(params_with_grad, File "C:\Users\Mark\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\optim\adam.py", line 213, in adam func(params, File "C:\Users\Mark\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\optim\adam.py", line 255, in _single_tensor_adam assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors." AssertionError: If capturable=False, state_steps should not be CUDA tensors.

opened by MarkIzhao 10
求助！！！在下载剩余的包 pip install -r requirements.txt 报错了大佬知道咋解决吗

以下是报错代码： Building wheels for collected packages: ctc-segmentation, pyworld Building wheel for ctc-segmentation (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [12 lines of output] running bdist_wheel running build running build_py creating build creating build\lib.win-amd64-3.7 creating build\lib.win-amd64-3.7\ctc_segmentation copying ctc_segmentation\ctc_segmentation.py -> build\lib.win-amd64-3.7\ctc_segmentation copying ctc_segmentation\partitioning.py -> build\lib.win-amd64-3.7\ctc_segmentation copying ctc_segmentation_init_.py -> build\lib.win-amd64-3.7\ctc_segmentation running build_ext building 'ctc_segmentation.ctc_segmentation_dyn' extension error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/ [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for ctc-segmentation Running setup.py clean for ctc-segmentation Building wheel for pyworld (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for pyworld (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [13 lines of output] running bdist_wheel running build running build_py creating build creating build\lib.win-amd64-3.7 creating build\lib.win-amd64-3.7\pyworld copying pyworld_init_.py -> build\lib.win-amd64-3.7\pyworld running build_ext skipping 'pyworld\pyworld.cpp' Cython extension (up-to-date) building 'pyworld.pyworld' extension C:\Users\ADMINI~1\AppData\Local\Temp\pip-build-env-_y7fbfzj\overlay\Lib\site-packages\setuptools\dist.py:741: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead % (opt, underscore_opt) error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/ [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for pyworld Failed to build ctc-segmentation pyworld ERROR: Could not build wheels for pyworld, which is required to install pyproject.toml-based projects

opened by frankl07 10
训练模型时这个问题怎么办？？疑似N卡内存不够。 CUDA out of memory. Tried to allocate 122.00 MiB (GPU 0; 4.00 GiB total capacity; 3.15 GiB already allocated; 0 bytes free; 3.45 GiB reserved in total by PyTorch

Summary[问题简述（一句话）] 训练模型时这个问题怎么办？？疑似N卡内存不够。 CUDA out of memory. Tried to allocate 122.00 MiB (GPU 0; 4.00 GiB total capacity; 3.15 GiB already allocated; 0 bytes free; 3.45 GiB reserved in total by PyTorch

Env & To Reproduce[复现与环境] python3.9、NVIDIA GeForce GTX 1050Ti（4GB）

Screenshots[截图（如有）]

opened by pzhyyd 9
AttributeError: module 'setuptools._distutils' has no attribute 'version'

F:\VideoCentTools\MockingBird-main>python synthesizer_train.py offhen F:\VideoCentTools/SV2TTS/synthesizer Traceback (most recent call last): File "F:\VideoCentTools\MockingBird-main\synthesizer_train.py", line 2, in from synthesizer.train import train File "F:\VideoCentTools\MockingBird-main\synthesizer\train.py", line 5, in from torch.utils.tensorboard import SummaryWriter File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\tensorboard_init_.py", line 4, in LooseVersion = distutils.version.LooseVersion AttributeError: module 'setuptools._distutils' has no attribute 'version'

opened by Dustwinddd 9
$FileNotFoundError: [Errno 2] No such file or directory: 'encoder\\saved_models\\pretrained.pt'$

FileNotFoundError: [Errno 2] No such file or directory: 'encoder\\saved_models\\pretrained.pt'

我把已经下载好的模型，放到了文件D:\声音克隆\MockingBird-main\synthesizer\saved_models下并且还在D:\声音克隆\MockingBird-main\encoder\saved_models里也放了一个把模型my_run,py改名为pretrained.pt的文件然后运行web.py文件

(base) C:\Users\13549>python D:\声音克隆\MockingBird-main\web.py Loaded synthesizer models: 0 Traceback (most recent call last): File "D:\声音克隆\MockingBird-main\web.py", line 6, in app = webApp() File "D:\声音克隆\MockingBird-main\web_init_.py", line 33, in webApp encoder.load_model(Path("encoder/saved_models/pretrained.pt")) File "D:\声音克隆\MockingBird-main\encoder\inference.py", line 33, in load_model checkpoint = torch.load(weights_fpath, _device) File "D:\anaconda\lib\site-packages\torch\serialization.py", line 525, in load with _open_file_like(f, 'rb') as opened_file: File "D:\anaconda\lib\site-packages\torch\serialization.py", line 212, in _open_file_like return _open_file(name_or_buffer, mode) File "D:\anaconda\lib\site-packages\torch\serialization.py", line 193, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'encoder\saved_models\pretrained.pt'

请问一下该怎么办呢，即使我把模型文件名修改成pretrained仍然会报同样的错误

opened by qinan-nlx 9
pyworld版本过高导致报错ValueError: numpy.ndarray size changed...

今天刚搭建的环境，使用VC拟音时遇到 ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject 此时python=3.9.15, numpy=1.19.3, pyworld=0.3.2 查看报错信息发现和pyworld有关，将pyworld版本降低到0.3.0后不再报错。可能需要在requirements.txt限制pyworld版本？

opened by carseny 0
训练到18K，attention图不显示

Summary[问题简述（一句话）] A clear and concise description of what the issue is.

训练到18K，attention图不显示

Env & To Reproduce[复现与环境] 描述你用的环境、代码版本、模型

win10，anaconda虚拟环境python=3.9.7，代码版本是在2022年12月20日下载的，数据集是aidatatang_200zh。

电脑是MacBook Pro 2019

CPU: Intel(R) Core(TM) i9-9880H @2.3GHz

Screenshots[截图（如有）] If applicable, add screenshots to help

opened by zijubk 0

求助：AttributeError: module 'umap' has no attribute 'UMAP'

Summary[问题简述（一句话）] 在Windows上运行出现问题：module 'umap' has no attribute 'UMAP' 在训练声码器和启动demo时都出现了这个问题

Env & To Reproduce[复现与环境]

Traceback (most recent call last):
  File "D:\MockingBird\encoder_train.py", line 46, in <module>
    train(**vars(args))
  File "D:\MockingBird\encoder\train.py", line 100, in train
    vis.draw_projections(embeds, utterances_per_speaker, step, projection_fpath)
  File "D:\MockingBird\encoder\visualizations.py", line 164, in draw_projections
    reducer = umap.UMAP()
AttributeError: module 'umap' has no attribute 'UMAP'

opened by heziyu2025 0

小白求教：运行工具箱时报错“AttributeError: 'Toolbox' object has no attribute 'selected_source_utterance'”

加载数据集运行工具箱时报错： AttributeError: 'Toolbox' object has no attribute 'selected_source_utterance'

工具箱右上边，输入框下边的Vocode only按钮也是灰色的，只能合成没有声音输出

工具箱左下角 Toolbox outpup 也是无法加载选项

opened by love530love 0
请更新依赖 requirements.txt ，web.py 所需的一些包并未涵盖在其中
当前版本：@main-b402f9d 时间：2022-12-15

当前 requirements.txt 中所缺失的包：

fastapi==0.88.0 pydantic==1.10.2 typing_extensions==4.4.0 starlette==0.22.0 anyio==3.6.2 idna==3.4 sniffio==1.3.0 typing_extensions==4.4.0 loguru==0.6.0 colorama==0.4.6 win32-setctime==1.1.0 typer==0.7.0 click==8.0.0 colorama==0.4.6

注：其中倒数第二行， click 版本应注明保持在 8.0.0 ，新版本会导致 "get_os_args" 的报错。其他包最新版本仍可用
opened by Golevka2001 0

Releases(v0.0.1)

v0.0.1(Nov 7, 2021)

Source code(tar.gz)
Source code(zip)

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

Related tags

Overview

Features

Quick Start

1. Install Requirements

2. Prepare your models

2.1 Train encoder with your dataset (Optional)

2.2 Train synthesizer with your dataset

2.3 Use pretrained model of synthesizer

2.4 Train vocoder (Optional)

3. Launch

3.1 Using the web server

3.2 Using the Toolbox

Reference

F Q&A

1.Where can I download the dataset?

2.What is ?

3.Not enough VRAM

4.If it happens RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).

5. How to improve CPU and GPU occupancy rate?

6. What if it happens the page file is too small to complete the operation

7. When should I stop during training?

Comments

Releases(v0.0.1)

v0.0.1(Nov 7, 2021)

Owner

Vega

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

This project converts your human voice input to its text transcript and to an automated voice too.

Every Google, Azure & IBM text to speech voice for free

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Speech Recognition for Uyghur using Speech transformer

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Simple Speech to Text, Text to Speech

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

A method to generate speech across multiple speakers

2.What is?

4.If it happens `RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).`

6. What if it happens `the page file is too small to complete the operation`