Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Overview

Mailing list : test Mailing list : test License: CC BY-NC 4.0

Donations Backers Sponsors

header

Silero Models

Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.

Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.

As a bonus:

  • No Kaldi;
  • No compilation;
  • No 20-step instructions;

Also we have published TTS models that satisfy the following criteria:

  • One-line usage;
  • A large library of voices;
  • A fully end-to-end pipeline;
  • Naturally sounding speech;
  • No GPU or training required;
  • Minimalism and lack of dependencies;
  • Faster than real-time on one CPU thread (!!!);
  • Support for 16kHz and 8kHz out of the box;

Speech-To-Text

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Screenshot_1

Currently we provide the following checkpoints:

PyTorch ONNX Quantization Quality Colab
English (en_v5) ✔️ ✔️ ✔️ link Open In Colab
German (de_v4) ✔️ ✔️ link Open In Colab
English (en_v3) ✔️ ✔️ ✔️ link Open In Colab
German (de_v3) ✔️ link Open In Colab
German (de_v1) ✔️ ✔️ link Open In Colab
Spanish (es_v1) ✔️ ✔️ link Open In Colab
Ukrainian (ua_v3) ✔️ ✔️ ✔️ N/A Open In Colab

Model flavours:

jit jit jit jit jit_q jit_q onnx onnx onnx onnx
xsmall small large xlarge xsmall small xsmall small large xlarge
English en_v5 ✔️ ✔️ ✔️ ✔️ ✔️
English en_v4_0 ✔️ ✔️
English en_v3 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
German de_v4 ✔️ ✔️
German de_v3 ✔️
German de_v1 ✔️ ✔️
Spanish es_v1 ✔️ ✔️
Ukrainian ua_v3 ✔️ ✔️ ✔️

Dependencies

  • All examples:
    • torch, 1.8+ (used to clone the repo in tf and onnx examples), breaking changes for version older than 1.6
    • torchaudio, latest version bound to PyTorch should work
    • omegaconf, latest just should work
  • Additional for ONNX examples:
    • onnx, latest just should work
    • onnxruntime, latest just should work
  • Additional for TensorFlow examples:
    • tensorflow, latest just should work
    • tensorflow_hub, latest just should work

Please see the provided Colab for details for each example below. All examples are maintained to work with the latest major packaged versions of the installed libraries.

PyTorch

Open In Colab

Open on Torch Hub

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

Open In Colab

You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

TensorFlow

Open In Colab

SavedModel example

import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual tf model
torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
subprocess.run('rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model',  shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
print(decoder(torch.Tensor(res.numpy())[0]))

Text-To-Speech

Models and Speakers

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following speakers:

Speaker Auto-stress Language SR Colab
aidar_v2 yes ru (Russian) 8000, 16000 Open In Colab
baya_v2 yes ru (Russian) 8000, 16000 Open In Colab
irina_v2 yes ru (Russian) 8000, 16000 Open In Colab
kseniya_v2 yes ru (Russian) 8000, 16000 Open In Colab
natasha_v2 yes ru (Russian) 8000, 16000 Open In Colab
ruslan_v2 yes ru (Russian) 8000, 16000 Open In Colab
lj_v2 no en (English) 8000, 16000 Open In Colab
thorsten_v2 no de (German) 8000, 16000 Open In Colab
tux_v2 no es (Spanish) 8000, 16000 Open In Colab
gilles_v2 no fr (French) 8000, 16000 Open In Colab
multi_v2 no ru, en, de, es, fr, tt 8000, 16000 Open In Colab
aigul_v2 no ba (Bashkir) 8000, 16000 Open In Colab
erdni_v2 no xal (Kalmyk) 8000, 16000 Open In Colab
dilyara_v2 no tt (Tatar) 8000, 16000 Open In Colab
dilnavoz_v2 no uz (Uzbek) 8000, 16000 Open In Colab

(!!!) In multi_v2 all speakers can speak all of langauges (with various levels of fidelity).

Dependencies

Basic dependencies for colab examples:

  • torch, 1.9+;
  • torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);
  • omegaconf, latest (can be removed as well, if you do not load all of the configs);

PyTorch

Open In Colab

Open on Torch Hub

import torch

language = 'ru'
speaker = 'kseniya_v2'
sample_rate = 16000
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=speaker)
model.to(device)  # gpu or cpu

audio = model.apply_tts(texts=[example_text],
                        sample_rate=sample_rate)

Standalone Use

  • Standalone usage just requires PyTorch 1.9+ and python standard library;
  • Please see the detailed examples in Colab;
import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v2_kseniya.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_batch = ['В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.',
                 'Котики - это жидкость!',
                 'М+ама М+илу м+ыла с м+ылом.']
sample_rate = 16000

audio_paths = model.save_wav(texts=example_batch,
                             sample_rate=sample_rate)

FAQ

Wiki

Also check out our wiki.

Performance and Quality

Please refer to this wiki sections:

Adding new Languages

Please refer here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us, read our news.

Commercial Inquiries

Please see our wiki and tiers for relevant information and email us.

Citations

@misc{Silero Models,
  author = {Silero Team},
  title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-models}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Further reading

English

  • STT:

    • Towards an Imagenet Moment For Speech-To-Text - link
    • A Speech-To-Text Practitioners Criticisms of Industry and Academia - link
    • Modern Google-level STT Models Released - link
  • TTS:

    • High-Quality Text-to-Speech Made Accessible, Simple and Fast - link
  • VAD:

    • Modern Portable Voice Activity Detector Released - link

Chinese

  • STT:
    • 迈向语音识别领域的 ImageNet 时刻 - link
    • 语音领域学术界和工业界的七宗罪 - link

Russian

  • STT

    • Мы опубликовали современные STT модели сравнимые по качеству с Google - link
    • Понижаем барьеры на вход в распознавание речи - link
    • Огромный открытый датасет русской речи версия 1.0 - link
    • Насколько Быстрой Можно Сделать Систему STT? - link
    • Наша система Speech-To-Text - link
    • Speech To Text - link
  • TTS:

    • Мы Опубликовали Качественный, Простой, Доступный и Быстрый Синтез Речи - link
  • VAD:

    • Мы опубликовали современный Voice Activity Detector и не только -link

Donations

Please use the "sponsor" button.

Issues
  • Feature request - Adding Proper TF 2.0 Checkpoints (not onnx-tensorflow) + Batching + TF JS

    Feature request - Adding Proper TF 2.0 Checkpoints (not onnx-tensorflow) + Batching + TF JS

    Hello, gyus! Your models are brilliant and I want to use it in my project via tensorflow serving. But it can't works without batching. Can you pleese save models with batching? Thank you!

    enhancement help wanted 
    opened by aleks73337 28
  • Feature request - SAPI5

    Feature request - SAPI5

    SAPI5 compatibility

    🚀 Feature

    Motivation

    Mostly enough for screen readers (Windows). But this interface is for integration by its nature. Ready to help!

    enhancement 
    opened by studennikov-serg 11
  • How to obtain an intermediate layer output?

    How to obtain an intermediate layer output?

    How do we obtain the output of an intermediate layer of the pre-trained model? For example, the output at the end of the convolution encoder, or the output just after the transformer encoder layers.

    help wanted 
    opened by prajwalkr 11
  • Bug report - running on ARM / RPI

    Bug report - running on ARM / RPI

    🐛 Bug

    I tried to use the model in a Raspberry PI 3B and i get the following error : fft: ATen not compiled with MKL support So i tried to modify the stft function in torch/functional.py to use the librosa stft instead, but it seems that the model use another torch stft instead of this i have on my package.

    The function used instead of torch stft

    def stft(input: Tensor, n_fft: int, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: Optional[Tensor] = None, center: bool = True, pad_mode: str = 'reflect', normalized: bool = False, onesided: Optional[bool] = None, return_complex: Optional[bool] = None): S = librosa.stft(np.array(input),n_fft,hop_length,win_length,window,center,pad_mode) s_real = np.real(S) s_real_shape = np.shape(s_real) s_real = np.reshape(s_real,(s_real_shape[0],s_real_shape[1],1)) s_imag = np.imag(S) s_imag_shape = np.shape(s_imag) s_imag = np.reshape(s_imag,(s_imag_shape[0],s_imag_shape[1],1)) S = np.concatenate((s_real,s_imag),axis=2) return torch.tensor(S)

    stack traces

    File "/home/Salim/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/stt_pretrained/models/model.py", line 27, in forward _2 = self.win_length _3 = torch.hann_window(self.n_fft, dtype=ops.prim.dtype(x), layout=None, device=ops.prim.device(x), pin_memory=None) x0 = torch.torch.functional.stft(x, _0, _1, _2, _3, True, "reflect", False, True, ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _4 = torch.slice(x0, 0, 0, 9223372036854775807, 1) _5 = torch.slice(_4, 1, 0, 9223372036854775807, 1) File "code/torch/torch/functional.py", line 21, in stft input0 = input print("test ok") _2 = torch.stft(input0, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~~~ <--- HERE return _2

    Traceback of TorchScript, original code (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 465, in stft input = F.pad(input.view(extended_shape), (pad, pad), pad_mode) input = input.view(input.shape[-signal_dim:]) return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~ <--- HERE RuntimeError: fft: ATen not compiled with MKL support

    Expected behavior

    Is it possible to modify the forward function that it will use the librosa stft for the raspberry PIs users ?

    Environment

    PyTorch version: 1.7.0a0+e85d494 Is debug build: True CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

    OS: Raspbian GNU/Linux 10 (buster) (armv7l) GCC version: (Raspbian 8.3.0-6+rpi1) 8.3.0 Clang version: Could not collect CMake version: version 3.13.4

    Python version: 3.7 (32-bit runtime) Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

    Versions of relevant libraries: [pip3] numpy==1.20.2 [pip3] numpydoc==0.7.0 [pip3] torch==1.7.0a0 [pip3] torchaudio==0.7.0a0+ac17b64 [pip3] torchvision==0.8.0a0+291f7e2 [conda] Could not collect

    bug 
    opened by Salim-alileche 9
  • Bug report - Support for sound backend on Linux

    Bug report - Support for sound backend on Linux

    🐛 Bug

    Running samples on the Linux platform (Ubuntu Focal/Mint Ulyssa flavors) causes crash due to the missing "soundfile" backend.

    To Reproduce

    On Ubuntu (focal):

    1. python3 -m python3 -m pip install pytorch torch omegaconf torchaudio
    2. Run minimal example from the README:
    import torch
    
    language = 'ru'
    speaker = 'kseniya_16khz'
    device = torch.device('cpu')
    model, symbols, sample_rate, example_text, apply_tts = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                                                          model='silero_tts',
                                                                          language=language,
                                                                          speaker=speaker)
    model = model.to(device)  # gpu or cpu
    audio = apply_tts(texts=[example_text],
                      model=model,
                      sample_rate=sample_rate,
                      symbols=symbols,
                      device=device)
    

    Error message received:

    Traceback (most recent call last):
    ...
        from utils import (init_jit_model,
      File "/home/user/.cache/torch/hub/snakers4_silero-models_master/utils.py", line 16, in <module>
        torchaudio.set_audio_backend(audio_backend_name)  # switch backend
      File "/home/user/.local/lib/python3.8/site-packages/torchaudio/backend/utils.py", line 52, in set_audio_backend
        raise RuntimeError(
    RuntimeError: Backend "soundfile" is not one of available backends: ['sox', 'sox_io'].```
    
    It seems that on Linux the default sound backend should be "sox_io" (the "sox" backend is deprecated). The "soundfile" backend is only available on Windows.
    
    ## Expected behavior
    
    The example code should work on Linux.
    
    ## Environment
    
    PyTorch version: 1.8.1+cu102
    Is debug build: False
    CUDA used to build PyTorch: 10.2
    ROCM used to build PyTorch: N/A
    
    OS: Linux Mint 20.1 (x86_64)
    GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    Clang version: 10.0.0-4ubuntu1 
    CMake version: version 3.16.3
    
    Python version: 3.8 (64-bit runtime)
    Is CUDA available: False
    CUDA runtime version: No CUDA
    GPU models and configuration: No CUDA
    Nvidia driver version: No CUDA
    cuDNN version: No CUDA
    HIP runtime version: N/A
    MIOpen runtime version: N/A
    
    Versions of relevant libraries:
    [pip3] numpy==1.17.4
    [pip3] torch==1.8.1
    [pip3] torchaudio==0.8.1
    [conda] Could not collect
    
    ## Additional context
    
    Suggested solution - fix util.py to add support for platforms other than Windows.
    bug 
    opened by trackrx 6
  • Feature request - Ukrainian model

    Feature request - Ukrainian model

    🚀 Feature

    We would like to have a Ukrainian model for the task of Speech-to-Text.

    Motivation

    Ukraine has a large population and in the country and there are tons of tasks related to Speech-to-Text.

    Additional context

    Our group that is based in Telegram ( https://t.me/speech_recognition_uk ) collected a dataset of Ukrainian public speeches/interviews in audio and text formats accessed here: https://mega.nz/folder/T34DQSCL#Q1O8vcrX_8Qnp27Ge56_4A/folder/O3hzlKIJ

    We think this dataset will be helpful in the training process.

    enhancement 
    opened by egorsmkv 6
  • Adding the Run on Gradient Badge

    Adding the Run on Gradient Badge

    Added the "Run on Gradient" badge so users can run notebooks for free in a persistent environment. With Colab, you need to re-install everything every time you start your notebook.

    Relevant documentation: https://docs.paperspace.com/gradient/projects/github-badge-run-on-gradient

    opened by rachrapp 5
  • Huge Amounts of Parasite Traffic (?)

    Huge Amounts of Parasite Traffic (?)

    I used to host model files via a S3 bucket. But starting in January the CDN started showing ~20 TB per month download traffic, which we investigated immediately and moved to our own hosting first believing this to be a billing bug.

    Currently the stats show ~68 clones in 2 weeks and force_reload=False everywhere in the examples by default, which is quite a modest traffic.

    We analyzed the traffic for a day and found out that the majority of this "strange" traffic is going from random IPs from a AWS subnets

    image image

    I.e. 2-3 requests from one IP, then subnet change, most subnets are AWS. Looks like some CI job gone rogue or some botnet scraping our URLs continuously

    help wanted 
    opened by snakers4 5
  • Problem loading german onnx model

    Problem loading german onnx model

    Please Support: I have tried both the english pytorch model and the onnx model. Both work fine on different wav files i've tried. After that i tried the same for the german models. Again the pytorch model is fine, but the onnx model seems totally off. So my idea was to export to create a new onnx model from the pytorch-model using "torch.onnx.export(...)", but unfortunately this fails with error:

    temporary: the only valid use of a module is looking up an attribute but found = prim::SetAttr[name="num_batches_tracked"](%4257, %4387)

    I was looking for a fix but could'nt find any clue on how to make this work. Any ideas here ?

    opened by TheRealSvc 5
  • Train model For Persian

    Train model For Persian

    ❓ I've read your repo. thanks for this nice project.

    Where is the code which I can use to train model with my own data? Can I know your Model Structure?

    And How was the quality of data you used? I mean noises, transcription accuracy for training data, hours etc...

    Thanks

    help wanted 
    opened by masoudMZB 4
  • Feature request - [Wake Word Detection]

    Feature request - [Wake Word Detection]

    🚀 Feature

    It would be helpful if we could easily use wake word detection to complement the STT functionality. At present I'm using a third-party tool for wake word detection which then records audio for 4 seconds which is processed through silero for home automation purposes.

    Motivation & Pitch

    Adding a simple method for custom wake word detection would allow seamless integration for the purposes of home automation where an always listening device waits for a given wake word or phrase and then listens for a sentence for STT purposes, the text of which is then passed on to a different step in the chain.

    Additionally, while waiting a fixed amount of time for the follow-up sentence is straight-forward, it would be a helpful addition to also use the length of silence in a sentence to determine its termination.

    Alternatives

    Theses things can be done at present, but by having to use multiple tools. Being able to do this in one place would make this use case seamless and easier to process.

    I do understand if this is too far outside of your scope for this project.

    enhancement 
    opened by waytotheweb 1
  • Changelog

    Changelog

    Mirroring changelog Some important changes, too small for a release

    documentation 
    opened by snakers4 27
Releases(v1)
  • v1(Sep 16, 2020)

    header)

    Mailing list : test Mailing list : test License: CC BY-NC 4.0

    We publish the following models in this release:

    • English V1
    • German V1
    • Spanish V1

    | | PyTorch | ONNX | TensorFlow | Quantization | Quality | Colab | |-----------------|--------------------|--------------------|--------------------|--------------|---------|-------| | English (en_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab | | German (de_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab | | Spanish (es_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab |

    Source code(tar.gz)
    Source code(zip)
Owner
Alexander Veysov
Alexander Veysov
Mirco Ravanelli 2.1k Oct 18, 2021
Saptak Bhoumik 13 Sep 12, 2021
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 4.3k Oct 24, 2021
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

?? Contributing to OpenSpeech ?? OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

Openspeech TEAM 273 Oct 17, 2021
Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing available to everyone, we're also releasing example implementation and recipe on some opensource dataset for various tasks (Automatic Speech Recognition, Speech Synthesis, Voice Conversion, Speaker Recognition, etc).

Ke Technologies 27 Sep 23, 2021
TTS is a library for advanced Text-to-Speech generation.

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.

Mozilla 5.2k Oct 22, 2021
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognition, speech enhancement, multi-microphone signal processing and many others.

SpeechBrain 3.2k Oct 23, 2021
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 16 Oct 13, 2021
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 86 Jun 11, 2021
Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library.

GI-Pi Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library. The SP0

Nick Bild 9 Sep 13, 2021
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.5k Oct 14, 2021
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.4k Feb 17, 2021
VoiceFixer VoiceFixer is a framework for general speech restoration.

VoiceFixer VoiceFixer is a framework for general speech restoration. We aim at the restoration of severly degraded speech and historical speech. Paper

Leo 78 Oct 20, 2021
Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

simple_diarizer Simplified diarization pipeline using some pretrained models. Made to be a simple as possible to go from an input audio file to diariz

Chau 11 Oct 16, 2021
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.6k Oct 23, 2021
Official implementation of Meta-StyleSpeech and StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang This is an official code

min95 39 Oct 13, 2021
A Python/Pytorch app for easily synthesising human voices

Voice Cloning App A Python/Pytorch app for easily synthesising human voices Documentation Discord Server Video guide Voice Sharing Hub FAQ's System Re

Ben Andrew 431 Oct 21, 2021
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart 13 Oct 9, 2021