Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov

Last update: Dec 31, 2022

Related tags

Text Data & NLP text-to-speech german speech pytorch english speech-recognition spanish colab speech-to-text pretrained-models stt asr pretrained onnx stt-benchmark enterprise-grade-stt silero-models tts-models torch-hub

Overview

Silero Models

Silero Models

Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.

Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.

As a bonus:

No Kaldi;
No compilation;
No 20-step instructions;

Also we have published TTS models that satisfy the following criteria:

One-line usage;
A large library of voices;
A fully end-to-end pipeline;
Naturally sounding speech;
No GPU or training required;
Minimalism and lack of dependencies;
Faster than real-time on one CPU thread (!!!);
Support for 16kHz and 8kHz out of the box;

Speech-To-Text

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following checkpoints:

	PyTorch	ONNX	Quantization	Quality
English (`en_v5`)	✔️	✔️	✔️	link
German (`de_v4`)	✔️	✔️	⌛	link
English (`en_v3`)	✔️	✔️	✔️	link
German (`de_v3`)	✔️	⌛	⌛	link
German (`de_v1`)	✔️	✔️	⌛	link
Spanish (`es_v1`)	✔️	✔️	⌛	link
Ukrainian (`ua_v3`)	✔️	✔️	✔️	N/A

Model flavours:

	jit	jit	jit	jit	jit_q	jit_q	onnx	onnx	onnx	onnx
	xsmall	small	large	xlarge	xsmall	small	xsmall	small	large	xlarge
English `en_v5`		✔️		✔️		✔️		✔️		✔️
English `en_v4_0`			✔️						✔️
English `en_v3`	✔️	✔️	✔️		✔️	✔️	✔️	✔️	✔️
German `de_v4`			✔️						✔️
German `de_v3`			✔️
German `de_v1`		✔️					✔️
Spanish `es_v1`		✔️					✔️
Ukrainian `ua_v3`		✔️			✔️		✔️

Dependencies

All examples:
- torch, 1.8+ (used to clone the repo in tf and onnx examples), breaking changes for version older than 1.6
- torchaudio, latest version bound to PyTorch should work
- omegaconf, latest just should work
Additional for ONNX examples:
- onnx, latest just should work
- onnxruntime, latest just should work
Additional for TensorFlow examples:
- tensorflow, latest just should work
- tensorflow_hub, latest just should work

Please see the provided Colab for details for each example below. All examples are maintained to work with the latest major packaged versions of the installed libraries.

PyTorch

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

TensorFlow

SavedModel example

import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual tf model
torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
subprocess.run('rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model',  shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
print(decoder(torch.Tensor(res.numpy())[0]))

Text-To-Speech

Models and Speakers

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following speakers:

Speaker	Auto-stress	Language	SR
`aidar_v2`	yes	`ru` (Russian)	`8000`, `16000`
`baya_v2`	yes	`ru` (Russian)	`8000`, `16000`
`irina_v2`	yes	`ru` (Russian)	`8000`, `16000`
`kseniya_v2`	yes	`ru` (Russian)	`8000`, `16000`
`natasha_v2`	yes	`ru` (Russian)	`8000`, `16000`
`ruslan_v2`	yes	`ru` (Russian)	`8000`, `16000`
`lj_v2`	no	`en` (English)	`8000`, `16000`
`thorsten_v2`	no	`de` (German)	`8000`, `16000`
`tux_v2`	no	`es` (Spanish)	`8000`, `16000`
`gilles_v2`	no	`fr` (French)	`8000`, `16000`
`multi_v2`	no	`ru`, `en`, `de`, `es`, `fr`, `tt`	`8000`, `16000`
`aigul_v2`	no	`ba` (Bashkir)	`8000`, `16000`
`erdni_v2`	no	`xal` (Kalmyk)	`8000`, `16000`
`dilyara_v2`	no	`tt` (Tatar)	`8000`, `16000`
`dilnavoz_v2`	no	`uz` (Uzbek)	`8000`, `16000`

(!!!) In multi_v2 all speakers can speak all of langauges (with various levels of fidelity).

Dependencies

Basic dependencies for colab examples:

torch, 1.9+;
torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);
omegaconf, latest (can be removed as well, if you do not load all of the configs);

PyTorch

import torch

language = 'ru'
speaker = 'kseniya_v2'
sample_rate = 16000
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=speaker)
model.to(device)  # gpu or cpu

audio = model.apply_tts(texts=[example_text],
                        sample_rate=sample_rate)

Standalone Use

Standalone usage just requires PyTorch 1.9+ and python standard library;
Please see the detailed examples in Colab;

import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v2_kseniya.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_batch = ['В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.',
                 'Котики - это жидкость!',
                 'М+ама М+илу м+ыла с м+ылом.']
sample_rate = 16000

audio_paths = model.save_wav(texts=example_batch,
                             sample_rate=sample_rate)

FAQ

Wiki

Also check out our wiki.

Performance and Quality

Please refer to this wiki sections:

Adding new Languages

Please refer here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us, read our news.

Commercial Inquiries

Please see our wiki and tiers for relevant information and email us.

Citations

@misc{Silero Models,
  author = {Silero Team},
  title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-models}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Donations

Please use the "sponsor" button.

Comments

Feature request - Adding Proper TF 2.0 Checkpoints (not onnx-tensorflow) + Batching + TF JS

Hello, gyus! Your models are brilliant and I want to use it in my project via tensorflow serving. But it can't works without batching. Can you pleese save models with batching? Thank you!
enhancement help wanted

opened by aleks73337 28
README's Standalone Use misses to mention NumPy
Currently, https://github.com/snakers4/silero-models#standalone-use states that:

Standalone usage just requires PyTorch 1.10+ and python standard library;

but I had to install NumPy as well to make the example work.
bug
opened by ghost 23
Bug report - RuntimeError: Unknown qengine

Hello. Great project! I would like to test a standard example, but at the line: model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model") I get an error: \lib\site-packages\torch\jit_script.py", line 351, in unpackage_script_module cpp_module = torch._C._import_ir_module_from_package( untimeError: Unknown qengine

Python 10.4 , Torch 11.0 , device='cpu', Windows 10 Model: torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/ru_v3.pt', local_file)
Tell me, please, how to fix it?
bug

opened by lik2129 14
Bug report -problem loading STT model on Windows
Hi, I decided to try selero_models, I do everything as in the dock, but I get an error. How to fix?

code:

import torch import zipfile import torchaudio from glob import glob device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language='en', # also available 'de', 'es' device=device)

Error: RuntimeError Traceback (most recent call last) C:\Users\E786~1\AppData\Local\Temp/ipykernel_9444/3004546653.py in 1 device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU ----> 2 model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', 3 model='silero_stt', 4 language='en', # also available 'de', 'es' 5 device=device)

c:\PY\asistent.venv\lib\site-packages\torch\hub.py in load(repo_or_dir, model, source, force_reload, verbose, skip_validation, *args, **kwargs) 397 repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation) 398 --> 399 model = _load_local(repo_or_dir, model, *args, **kwargs) 400 return model 401

c:\PY\asistent.venv\lib\site-packages\torch\hub.py in _load_local(hubconf_dir, model, *args, **kwargs) 426 427 entry = _load_entry_from_hubconf(hub_module, model) --> 428 model = entry(*args, **kwargs) 429 430 sys.path.remove(hubconf_dir)

~/.cache\torch\hub\snakers4_silero-models_master\hubconf.py in silero_stt(language, version, jit_model, **kwargs) 32 assert language in available_languages 33 ---> 34 model, decoder = init_jit_model(model_url=models.stt_models.get(language).get(version).get(jit_model), 35 **kwargs) 36 utils = (read_batch,

~/.cache\torch\hub\snakers4_silero-models_master\utils.py in init_jit_model(model_url, device) 128 progress=True) 129 --> 130 model = torch.jit.load(model_path, map_location=device) 131 model.eval() 132 return model, Decoder(model.labels)

c:\PY\asistent.venv\lib\site-packages\torch\jit_serialization.py in load(f, map_location, _extra_files) 159 cu = torch._C.CompilationUnit() 160 if isinstance(f, str) or isinstance(f, pathlib.Path): --> 161 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files) 162 else: 163 cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: open file failed because of errno 2 on fopen: No such file or directory, file path: C:\Users\Дом/.cache\torch\hub\snakers4_silero-models_master\model\en_v5.jit
bug
opened by lev007-ops 13
Feature request - SAPI5

SAPI5 compatibility

🚀 Feature

Motivation

Mostly enough for screen readers (Windows). But this interface is for integration by its nature. Ready to help!
enhancement

opened by studennikov-serg 11
How to obtain an intermediate layer output?

How do we obtain the output of an intermediate layer of the pre-trained model? For example, the output at the end of the convolution encoder, or the output just after the transformer encoder layers.
help wanted

opened by prajwalkr 11
Feature request - Expressiveness

🚀 Feature

Right now, in French STT, there is no decay upon a end of sentence. So if you have 2 sentences, the prosody is wrong and painful to hear. Each sentence by itself is almost perfect, but upon the end of a sentence, the pitch should decrease, the rate should also decrease and a short pause is required before starting a new sentence.

Motivation

This is useful as soon as you have more than 2 sentences to synthetize. Else, the current, excellent quality of the STT engine is useless, since no human speaks continuously across sentences.
enhancement

opened by X-Ryl669 9

Errors running example.ipynb locally or in Colab (PyTorch 1.10 issues)

Hi,

I am unable to run example.ipynb notebook locally (on CPU machine) or any of the Google Colab notebooks (either on CPU or GPU runtime).

Following error occurs for example.ipynb notebook:

model_url = model_conf.get('package')

model_dir = "downloaded_model"
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, os.path.basename(model_url))

if not os.path.isfile(model_path):
    torch.hub.download_url_to_file(model_url,
                                   model_path,
                                   progress=True)

imp = package.PackageImporter(model_path)
model = imp.load_pickle("te_model", "model")
example_texts = model.examples

def apply_te(text, lan='en'):
    return model.enhance_text(text, lan)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_2498123/2005539933.py in <module>
     10                                    progress=True)
     11 
---> 12 imp = package.PackageImporter(model_path)
     13 model = imp.load_pickle("te_model", "model")
     14 example_texts = model.examples

~/miniconda3/lib/python3.8/site-packages/torch/package/importer.py in __init__(self, file_or_buffer, module_allowed)
     59             self.filename = str(file_or_buffer)
     60             if not os.path.isdir(self.filename):
---> 61                 self.zip_reader = torch._C.PyTorchFileReader(self.filename)
     62             else:
     63                 self.zip_reader = MockZipReader(self.filename)

RuntimeError: [enforce fail at inline_container.cc:222] . file not found: v1_4lang_q/version

For any of the Google Colab notebooks, I get the following error when executing the very first cell:

     |████████████████████████████████| 74 kB 2.2 MB/s 
     |████████████████████████████████| 2.9 MB 11.8 MB/s 
     |████████████████████████████████| 112 kB 35.0 MB/s 
     |████████████████████████████████| 596 kB 46.5 MB/s 
  Building wheel for antlr4-python3-runtime (setup.py) ... done
/content/silero-models
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-1-5d873de0231f> in <module>()
     16 from glob import glob
     17 from omegaconf import OmegaConf
---> 18 from utils import (init_jit_model, 
     19                    split_into_batches,
     20                    read_audio,

5 frames
/usr/lib/python3.7/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    362 
    363         if handle is None:
--> 364             self._handle = _dlopen(self._name, mode)
    365         else:
    366             self._handle = handle

OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory

Thus, as a result, I am unable to run any examples - either locally or in Google Colab.

Thanks!

bug

opened by abhinavkulkarni 9

Bug report - running on ARM / RPI

🐛 Bug

I tried to use the model in a Raspberry PI 3B and i get the following error : fft: ATen not compiled with MKL support So i tried to modify the stft function in torch/functional.py to use the librosa stft instead, but it seems that the model use another torch stft instead of this i have on my package.

The function used instead of torch stft

def stft(input: Tensor, n_fft: int, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: Optional[Tensor] = None, center: bool = True, pad_mode: str = 'reflect', normalized: bool = False, onesided: Optional[bool] = None, return_complex: Optional[bool] = None): S = librosa.stft(np.array(input),n_fft,hop_length,win_length,window,center,pad_mode) s_real = np.real(S) s_real_shape = np.shape(s_real) s_real = np.reshape(s_real,(s_real_shape[0],s_real_shape[1],1)) s_imag = np.imag(S) s_imag_shape = np.shape(s_imag) s_imag = np.reshape(s_imag,(s_imag_shape[0],s_imag_shape[1],1)) S = np.concatenate((s_real,s_imag),axis=2) return torch.tensor(S)

stack traces

File "/home/Salim/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/stt_pretrained/models/model.py", line 27, in forward _2 = self.win_length _3 = torch.hann_window(self.n_fft, dtype=ops.prim.dtype(x), layout=None, device=ops.prim.device(x), pin_memory=None) x0 = torch.torch.functional.stft(x, _0, _1, _2, _3, True, "reflect", False, True, ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _4 = torch.slice(x0, 0, 0, 9223372036854775807, 1) _5 = torch.slice(_4, 1, 0, 9223372036854775807, 1) File "code/torch/torch/functional.py", line 21, in stft input0 = input print("test ok") _2 = torch.stft(input0, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~~~ <--- HERE return _2

Traceback of TorchScript, original code (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 465, in stft input = F.pad(input.view(extended_shape), (pad, pad), pad_mode) input = input.view(input.shape[-signal_dim:]) return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~ <--- HERE RuntimeError: fft: ATen not compiled with MKL support

Expected behavior

Is it possible to modify the forward function that it will use the librosa stft for the raspberry PIs users ?

Environment

PyTorch version: 1.7.0a0+e85d494 Is debug build: True CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Raspbian GNU/Linux 10 (buster) (armv7l) GCC version: (Raspbian 8.3.0-6+rpi1) 8.3.0 Clang version: Could not collect CMake version: version 3.13.4

Python version: 3.7 (32-bit runtime) Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.20.2 [pip3] numpydoc==0.7.0 [pip3] torch==1.7.0a0 [pip3] torchaudio==0.7.0a0+ac17b64 [pip3] torchvision==0.8.0a0+291f7e2 [conda] Could not collect
bug

opened by Salim-alileche 9
Feature request - Offline use of model

At the moment it is nearly impossible to create a docker container that works offline (without internet access). Even if you include this line during docker build:

RUN python -c "import torch; torch.backends.quantized.engine='qnnpack'; torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_te', force_reload=True)"

During execution of the docker container (without internet) you load it locally:

torch.hub.load(repo_or_dir='/root/.cache/torch/hub/snakers4_silero-models_master', model='silero_te', source='local', force_reload=False)

Then you have the problem that the hubconf.py is called again (and fails due to no internet access) and it tries to download the files in hubconf.py Lines 21, 49, 101, even though they already exist.

So my suggestion would be to also includes checks in the Lines 21,49,101 to check if the file already exists locally and if yes then skip it (like done in Line 114)

Any reasons against that?
enhancement

opened by Phil1108 7
Issue getting from silero model tried for text enhancement

Issue

File "<torch_package_104>.release_module.py", line 122, in enhance_text File "<torch_package_104>.release_module.py", line 101, in enhance_long_textblock File "<torch_package_104>.release_module.py", line 72, in enhance_textblock File "<torch_package_104>.release_module.py", line 165, in enhance_tokens IndexError: string index out of range

Details

I added punctuation to the text using Silero models over the PyTorch hub, and everything was going smoothly until the attached text example appeared. I have no idea why this is occurring. I'm using this model to add punctuation to transcripts that I collect from YouTube; some of them have a few missing punctuation marks (supplied by the video author), while others have no punctuation at all (auto-generated by youtube).

Transcript throwing Error

transcript 1: ""Hey there. How's it going everybody in this video? We'll be learning about python Data types and specifically We'll be learning about how to work with textual data and textual data in python are represented with strings So we currently have [opened] our intro pi file that we were working with in the last video Where we just printed out hello world and I'll go ahead and run this so that we can see that down here It does print out hello [world] [now] This line here is using the print function and we're passing this text value into that print function now if we wanted to create a Variable that holds that text value then we could say now I'll just get rid of this comment for now So if I wanted a variable to hold that value then I can just create a variable and we'll call that"

transcript 2: "you're now ready to see how to go one layer of a convolution on your network let's go through the example you've seen in the previous video how to take a 3d volume and convolve it with say two different filters in order to get in this example two different 4x4 outputs so let's say convolving with the first filter gives this first 4x4 output and convolving with this second filter gives a different 4x4 output the final thing to turn this into a convolutional neural net layer is that for each of these we're going to add it bias so this is going to be a real number and what - broadcasting you kind of had the same number - every you know one of these sixteen elements and then apply a non-linearity which for illustration that says there a luna mini arity and this gives you a 4x4 output after applying the bias and the non-linearity and then for this thing at the bottom as well you had some different buyers again this is a real number so you had the same row number - all 16 numbers and then applies some non-linearity that fairly non-linearity and this gives you a different 4x4 output then same as we did before if you take this and stack it up as follows so they end up with a 4 by 4 by 2 output then this computation where you've gone from 6 by 6 by 3 to a 4 by 4 by 4 this is one layer of a convolutional neural network Center mapped is back to one layer of for propagation in the standard neural network when a non convolutional neural network remember that one step afford prot was something like this right z1 equals w1 times a0 a0 was also equal to X right and then plus b1 and he applied the non-linearity to get a 1 so that's G of Z 1" Please review the above transcript that is and let us know what the problem is.

opened by Kishan-Sahu 6
Model getting stuck on some texts.

There hasn't been a debugging message to explain why the model keeps getting stuck for a very long period. Please assist us in adding a debugging message to the model so we can identify the cause of the problem.

The text for which the model stuck is given below:

Text: "we're going to set this by saying export Python path all uppercase and then equals and now we want to set that location so I'm just going to come over here and grab that location and paste that in those quotes and we want it to look just like that no space in between the equals and the path so to save that we can just hit ctrl X and then Y to save and then enter to keep the same file name and now we can either restart our terminal or run a source command on that file but I'll just restart the terminal here and pull this up and now if we run Python then let's see if we can import that module so import my module and we can see that that worked and the reason that worked is that if we import sis and look at our sis then we can see that after our current directory that we have the directory that was added there and the reason that it's added is that we added it to our Python path environment variable so now let's take a look at how to set"

I manually tested by eliminating strange letters and words and discovered that removing "ctrl" from text, worked effectively.

opened by Kishan-Sahu 2
Feature request - `` support for SSML

🚀 Feature

Allow phonetic pronunciation for necessary words

Motivation

Sometimes it's necessary to customize pronunciation of words with non-standard spelling or word borrowed from other languages. In that case having transcription in IPA or X-SAMPA would be nice (see e.g. Polly for explanation of the syntax)

Pitch

Wrapping IPA or X-SAMPA transcription into a <phoneme> tag makes the engine pronounce the word according to its specification.

Alternatives

Not sure if there are any within the project. Using other projects supporting <phoneme> is possible.

Additional context
enhancement

opened by lagleki 1
Packaging and PyPI releases
Hello,

Thank you for your hard work.

Is there any chance of getting installable Python package from PyPI for the project?

For example, it might look like this for installing STT models with PyTorch:

pip install silero-models-stt[torch]

This would be very handy for using the models in the production projects and environments.
help wanted
opened by espdev 9
Feature request - [Wake Word Detection]

🚀 Feature

It would be helpful if we could easily use wake word detection to complement the STT functionality. At present I'm using a third-party tool for wake word detection which then records audio for 4 seconds which is processed through silero for home automation purposes.

Motivation & Pitch

Adding a simple method for custom wake word detection would allow seamless integration for the purposes of home automation where an always listening device waits for a given wake word or phrase and then listens for a sentence for STT purposes, the text of which is then passed on to a different step in the chain.

Additionally, while waiting a fixed amount of time for the follow-up sentence is straight-forward, it would be a helpful addition to also use the length of silence in a sentence to determine its termination.

Alternatives

Theses things can be done at present, but by having to use multiple tools. Being able to do this in one place would make this use case seamless and easier to process.

I do understand if this is too far outside of your scope for this project.
enhancement

opened by waytotheweb 1

Releases(v0.4.1)

v0.4.1(Jun 12, 2022)
What's Changed

Fix models.yml loading by @rominf in https://github.com/snakers4/silero-models/pull/162

New Contributors

@rominf made their first contribution in https://github.com/snakers4/silero-models/pull/162

Full Changelog: https://github.com/snakers4/silero-models/compare/v0.4...v0.4.1
Source code(tar.gz)
Source code(zip)
v0.4(Jun 6, 2022)
What's Changed

Add version 3.1 by @Islanna in https://github.com/snakers4/silero-models/pull/157

Fx by @Islanna in https://github.com/snakers4/silero-models/pull/158

Fx by @Islanna in https://github.com/snakers4/silero-models/pull/159

Full Changelog: https://github.com/snakers4/silero-models/compare/v0.3...v0.4
Source code(tar.gz)
Source code(zip)
v0.3(May 23, 2022)
What's Changed

Testing the auto-build functionality

Update examples by @snakers4 in https://github.com/snakers4/silero-models/pull/137

Fx ssml and model loading by @Islanna in https://github.com/snakers4/silero-models/pull/140

Update README.md by @Islanna in https://github.com/snakers4/silero-models/pull/138

Tts v3 by @Islanna in https://github.com/snakers4/silero-models/pull/141

Full Changelog: https://github.com/snakers4/silero-models/compare/v0.1...v0.2
Source code(tar.gz)
Source code(zip)
v0.1(Feb 28, 2022)

This is a test release to test Github Action based PyPI publishing. Some proper semantic version will be created later.
Source code(tar.gz)
Source code(zip)
v1(Sep 16, 2020)
We publish the following models in this release:

English V1

German V1

Spanish V1

| | PyTorch | ONNX | TensorFlow | Quantization | Quality | Colab | |-----------------|--------------------|--------------------|--------------------|--------------|---------|-------| | English (en_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | | | German (de_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | | | Spanish (es_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | |
Source code(tar.gz)
Source code(zip)

Owner

Alexander Veysov

GitHub

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.

1.1k Jan 3, 2023

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

9 Nov 7, 2022

Use PaddlePaddle to reproduce the paper：mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

MT5_paddle Use PaddlePaddle to reproduce the paper：mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer English | 简体中文 mT5: A Massively

2 Oct 17, 2021

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

1 Dec 20, 2021

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

6 Apr 29, 2022

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

44 Dec 31, 2022

Must-read papers on improving efficiency for pre-trained language models.

89 Jan 3, 2023

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models Graformer (also named BridgeTransformer in t

22 Dec 14, 2022

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries.

2.3k Jan 8, 2023

Chinese Pre-Trained Language Models (CPM-LM) Version-I

CPM-Generate 为了促进中文自然语言处理研究的发展，本项目提供了 CPM-LM (2.6B) 模型的文本生成代码，可用于文本生成的本地测试，并以此为基础进一步研究零次学习/少次学习等场景。[项目首页] [模型下载] [技术报告] 若您想使用CPM-1进行推理，我们建议使用高效推理工具BMI

1.4k Jan 3, 2023

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

TweebankNLP This repo contains the new Tweebank-NER dataset and Twitter-Stanza p

84 Dec 20, 2022

Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

947 Dec 28, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

This is a really simple text-to-speech app made with python and tkinter.

Tkinter Text-to-Speech App by Souvik Roy This is a really simple tkinter app which converts the text you have entered into a speech. It is created wit

1 Dec 21, 2021

Google and Stanford University released a new pre-trained model called ELECTRA

Google and Stanford University released a new pre-trained model called ELECTRA, which has a much compact model size and relatively competitive performance compared to BERT and its variants. For further accelerating the research of the Chinese pre-trained model, the Joint Laboratory of HIT and iFLYTEK Research (HFL) has released the Chinese ELECTRA models based on the official code of ELECTRA. ELECTRA-small could reach similar or even higher scores on several NLP tasks with only 1/10 parameters compared to BERT and its variants.

1.2k Dec 30, 2022

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

4 Jul 1, 2022

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

72 Dec 9, 2022

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.

117 Jan 7, 2023

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Related tags

Overview

Silero Models

Speech-To-Text

Dependencies

PyTorch

ONNX

TensorFlow

Text-To-Speech

Models and Speakers

Dependencies

PyTorch

Standalone Use

FAQ

Wiki

Performance and Quality

Adding new Languages

Contact

Get in Touch

Commercial Inquiries

Citations

Further reading

English

Chinese

Russian

Donations

Comments

🚀 Feature

Motivation

🚀 Feature

Motivation

🐛 Bug

The function used instead of torch stft

stack traces

Expected behavior

Environment

Issue

Details

Transcript throwing Error

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

🚀 Feature

Motivation & Pitch

Alternatives

Releases(v0.4.1)

v0.4.1(Jun 12, 2022)

What's Changed

New Contributors

v0.4(Jun 6, 2022)

What's Changed

v0.3(May 23, 2022)

What's Changed

v0.1(Feb 28, 2022)

v1(Sep 16, 2020)

Owner

Alexander Veysov

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

Use PaddlePaddle to reproduce the paper：mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Must-read papers on improving efficiency for pre-trained language models.

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Chinese Pre-Trained Language Models (CPM-LM) Version-I

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

Guide to using pre-trained large language models of source code

Simple Speech to Text, Text to Speech

This is a really simple text-to-speech app made with python and tkinter.

Google and Stanford University released a new pre-trained model called ELECTRA

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

DziriBERT: a Pre-trained Language Model for the Algerian Dialect