Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Overview

Mailing list : test Mailing list : test License: CC BY-NC 4.0

Donations Backers Sponsors

header

Silero Models

Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.

Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.

As a bonus:

  • No Kaldi;
  • No compilation;
  • No 20-step instructions;

Also we have published TTS models that satisfy the following criteria:

  • One-line usage;
  • A large library of voices;
  • A fully end-to-end pipeline;
  • Naturally sounding speech;
  • No GPU or training required;
  • Minimalism and lack of dependencies;
  • Faster than real-time on one CPU thread (!!!);
  • Support for 16kHz and 8kHz out of the box;

Speech-To-Text

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Screenshot_1

Currently we provide the following checkpoints:

PyTorch ONNX Quantization Quality Colab
English (en_v5) ✔️ ✔️ ✔️ link Open In Colab
German (de_v4) ✔️ ✔️ link Open In Colab
English (en_v3) ✔️ ✔️ ✔️ link Open In Colab
German (de_v3) ✔️ link Open In Colab
German (de_v1) ✔️ ✔️ link Open In Colab
Spanish (es_v1) ✔️ ✔️ link Open In Colab
Ukrainian (ua_v3) ✔️ ✔️ ✔️ N/A Open In Colab

Model flavours:

jit jit jit jit jit_q jit_q onnx onnx onnx onnx
xsmall small large xlarge xsmall small xsmall small large xlarge
English en_v5 ✔️ ✔️ ✔️ ✔️ ✔️
English en_v4_0 ✔️ ✔️
English en_v3 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
German de_v4 ✔️ ✔️
German de_v3 ✔️
German de_v1 ✔️ ✔️
Spanish es_v1 ✔️ ✔️
Ukrainian ua_v3 ✔️ ✔️ ✔️

Dependencies

  • All examples:
    • torch, 1.8+ (used to clone the repo in tf and onnx examples), breaking changes for version older than 1.6
    • torchaudio, latest version bound to PyTorch should work
    • omegaconf, latest just should work
  • Additional for ONNX examples:
    • onnx, latest just should work
    • onnxruntime, latest just should work
  • Additional for TensorFlow examples:
    • tensorflow, latest just should work
    • tensorflow_hub, latest just should work

Please see the provided Colab for details for each example below. All examples are maintained to work with the latest major packaged versions of the installed libraries.

PyTorch

Open In Colab

Open on Torch Hub

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

Open In Colab

You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

TensorFlow

Open In Colab

SavedModel example

import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual tf model
torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
subprocess.run('rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model',  shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
print(decoder(torch.Tensor(res.numpy())[0]))

Text-To-Speech

Models and Speakers

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following speakers:

Speaker Auto-stress Language SR Colab
aidar_v2 yes ru (Russian) 8000, 16000 Open In Colab
baya_v2 yes ru (Russian) 8000, 16000 Open In Colab
irina_v2 yes ru (Russian) 8000, 16000 Open In Colab
kseniya_v2 yes ru (Russian) 8000, 16000 Open In Colab
natasha_v2 yes ru (Russian) 8000, 16000 Open In Colab
ruslan_v2 yes ru (Russian) 8000, 16000 Open In Colab
lj_v2 no en (English) 8000, 16000 Open In Colab
thorsten_v2 no de (German) 8000, 16000 Open In Colab
tux_v2 no es (Spanish) 8000, 16000 Open In Colab
gilles_v2 no fr (French) 8000, 16000 Open In Colab
multi_v2 no ru, en, de, es, fr, tt 8000, 16000 Open In Colab
aigul_v2 no ba (Bashkir) 8000, 16000 Open In Colab
erdni_v2 no xal (Kalmyk) 8000, 16000 Open In Colab
dilyara_v2 no tt (Tatar) 8000, 16000 Open In Colab
dilnavoz_v2 no uz (Uzbek) 8000, 16000 Open In Colab

(!!!) In multi_v2 all speakers can speak all of langauges (with various levels of fidelity).

Dependencies

Basic dependencies for colab examples:

  • torch, 1.9+;
  • torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);
  • omegaconf, latest (can be removed as well, if you do not load all of the configs);

PyTorch

Open In Colab

Open on Torch Hub

import torch

language = 'ru'
speaker = 'kseniya_v2'
sample_rate = 16000
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=speaker)
model.to(device)  # gpu or cpu

audio = model.apply_tts(texts=[example_text],
                        sample_rate=sample_rate)

Standalone Use

  • Standalone usage just requires PyTorch 1.9+ and python standard library;
  • Please see the detailed examples in Colab;
import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v2_kseniya.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_batch = ['В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.',
                 'Котики - это жидкость!',
                 'М+ама М+илу м+ыла с м+ылом.']
sample_rate = 16000

audio_paths = model.save_wav(texts=example_batch,
                             sample_rate=sample_rate)

FAQ

Wiki

Also check out our wiki.

Performance and Quality

Please refer to this wiki sections:

Adding new Languages

Please refer here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us, read our news.

Commercial Inquiries

Please see our wiki and tiers for relevant information and email us.

Citations

@misc{Silero Models,
  author = {Silero Team},
  title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-models}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Further reading

English

  • STT:

    • Towards an Imagenet Moment For Speech-To-Text - link
    • A Speech-To-Text Practitioners Criticisms of Industry and Academia - link
    • Modern Google-level STT Models Released - link
  • TTS:

    • High-Quality Text-to-Speech Made Accessible, Simple and Fast - link
  • VAD:

    • Modern Portable Voice Activity Detector Released - link

Chinese

  • STT:
    • 迈向语音识别领域的 ImageNet 时刻 - link
    • 语音领域学术界和工业界的七宗罪 - link

Russian

  • STT

    • Мы опубликовали современные STT модели сравнимые по качеству с Google - link
    • Понижаем барьеры на вход в распознавание речи - link
    • Огромный открытый датасет русской речи версия 1.0 - link
    • Насколько Быстрой Можно Сделать Систему STT? - link
    • Наша система Speech-To-Text - link
    • Speech To Text - link
  • TTS:

    • Мы Опубликовали Качественный, Простой, Доступный и Быстрый Синтез Речи - link
  • VAD:

    • Мы опубликовали современный Voice Activity Detector и не только -link

Donations

Please use the "sponsor" button.

Comments
  • Feature request - Adding Proper TF 2.0 Checkpoints (not onnx-tensorflow) + Batching + TF JS

    Feature request - Adding Proper TF 2.0 Checkpoints (not onnx-tensorflow) + Batching + TF JS

    Hello, gyus! Your models are brilliant and I want to use it in my project via tensorflow serving. But it can't works without batching. Can you pleese save models with batching? Thank you!

    enhancement help wanted 
    opened by aleks73337 28
  • README's Standalone Use misses to mention NumPy

    README's Standalone Use misses to mention NumPy

    Currently, https://github.com/snakers4/silero-models#standalone-use states that:

    • Standalone usage just requires PyTorch 1.10+ and python standard library;

    but I had to install NumPy as well to make the example work.

    bug 
    opened by ghost 23
  • Bug report - RuntimeError: Unknown qengine

    Bug report - RuntimeError: Unknown qengine

    Hello. Great project! I would like to test a standard example, but at the line: model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model") I get an error: \lib\site-packages\torch\jit_script.py", line 351, in unpackage_script_module cpp_module = torch._C._import_ir_module_from_package( untimeError: Unknown qengine

    Python 10.4 , Torch 11.0 , device='cpu', Windows 10 Model: torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/ru_v3.pt', local_file)
    Tell me, please, how to fix it?

    bug 
    opened by lik2129 14
  • Bug report -problem loading STT model on Windows

    Bug report -problem loading STT model on Windows

    Hi, I decided to try selero_models, I do everything as in the dock, but I get an error. How to fix?

    code:

    import torch
    import zipfile
    import torchaudio
    from glob import glob
    
    device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
    model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                           model='silero_stt',
                                           language='en', # also available 'de', 'es'
                                           device=device)
    

    Error: RuntimeError Traceback (most recent call last) C:\Users\E786~1\AppData\Local\Temp/ipykernel_9444/3004546653.py in 1 device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU ----> 2 model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', 3 model='silero_stt', 4 language='en', # also available 'de', 'es' 5 device=device)

    c:\PY\asistent.venv\lib\site-packages\torch\hub.py in load(repo_or_dir, model, source, force_reload, verbose, skip_validation, *args, **kwargs) 397 repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation) 398 --> 399 model = _load_local(repo_or_dir, model, *args, **kwargs) 400 return model 401

    c:\PY\asistent.venv\lib\site-packages\torch\hub.py in _load_local(hubconf_dir, model, *args, **kwargs) 426 427 entry = _load_entry_from_hubconf(hub_module, model) --> 428 model = entry(*args, **kwargs) 429 430 sys.path.remove(hubconf_dir)

    ~/.cache\torch\hub\snakers4_silero-models_master\hubconf.py in silero_stt(language, version, jit_model, **kwargs) 32 assert language in available_languages 33 ---> 34 model, decoder = init_jit_model(model_url=models.stt_models.get(language).get(version).get(jit_model), 35 **kwargs) 36 utils = (read_batch,

    ~/.cache\torch\hub\snakers4_silero-models_master\utils.py in init_jit_model(model_url, device) 128 progress=True) 129 --> 130 model = torch.jit.load(model_path, map_location=device) 131 model.eval() 132 return model, Decoder(model.labels)

    c:\PY\asistent.venv\lib\site-packages\torch\jit_serialization.py in load(f, map_location, _extra_files) 159 cu = torch._C.CompilationUnit() 160 if isinstance(f, str) or isinstance(f, pathlib.Path): --> 161 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files) 162 else: 163 cpp_module = torch._C.import_ir_module_from_buffer(

    RuntimeError: open file failed because of errno 2 on fopen: No such file or directory, file path: C:\Users\Дом/.cache\torch\hub\snakers4_silero-models_master\model\en_v5.jit

    bug 
    opened by lev007-ops 13
  • Feature request - SAPI5

    Feature request - SAPI5

    SAPI5 compatibility

    🚀 Feature

    Motivation

    Mostly enough for screen readers (Windows). But this interface is for integration by its nature. Ready to help!

    enhancement 
    opened by studennikov-serg 11
  • How to obtain an intermediate layer output?

    How to obtain an intermediate layer output?

    How do we obtain the output of an intermediate layer of the pre-trained model? For example, the output at the end of the convolution encoder, or the output just after the transformer encoder layers.

    help wanted 
    opened by prajwalkr 11
  • Feature request - Expressiveness

    Feature request - Expressiveness

    🚀 Feature

    Right now, in French STT, there is no decay upon a end of sentence. So if you have 2 sentences, the prosody is wrong and painful to hear. Each sentence by itself is almost perfect, but upon the end of a sentence, the pitch should decrease, the rate should also decrease and a short pause is required before starting a new sentence.

    Motivation

    This is useful as soon as you have more than 2 sentences to synthetize. Else, the current, excellent quality of the STT engine is useless, since no human speaks continuously across sentences.

    enhancement 
    opened by X-Ryl669 9
  • Errors running example.ipynb locally or in Colab (PyTorch 1.10 issues)

    Errors running example.ipynb locally or in Colab (PyTorch 1.10 issues)

    Hi,

    I am unable to run example.ipynb notebook locally (on CPU machine) or any of the Google Colab notebooks (either on CPU or GPU runtime).

    Following error occurs for example.ipynb notebook:

    model_url = model_conf.get('package')
    
    model_dir = "downloaded_model"
    os.makedirs(model_dir, exist_ok=True)
    model_path = os.path.join(model_dir, os.path.basename(model_url))
    
    if not os.path.isfile(model_path):
        torch.hub.download_url_to_file(model_url,
                                       model_path,
                                       progress=True)
    
    imp = package.PackageImporter(model_path)
    model = imp.load_pickle("te_model", "model")
    example_texts = model.examples
    
    def apply_te(text, lan='en'):
        return model.enhance_text(text, lan)
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    /tmp/ipykernel_2498123/2005539933.py in <module>
         10                                    progress=True)
         11 
    ---> 12 imp = package.PackageImporter(model_path)
         13 model = imp.load_pickle("te_model", "model")
         14 example_texts = model.examples
    
    ~/miniconda3/lib/python3.8/site-packages/torch/package/importer.py in __init__(self, file_or_buffer, module_allowed)
         59             self.filename = str(file_or_buffer)
         60             if not os.path.isdir(self.filename):
    ---> 61                 self.zip_reader = torch._C.PyTorchFileReader(self.filename)
         62             else:
         63                 self.zip_reader = MockZipReader(self.filename)
    
    RuntimeError: [enforce fail at inline_container.cc:222] . file not found: v1_4lang_q/version
    

    For any of the Google Colab notebooks, I get the following error when executing the very first cell:

         |████████████████████████████████| 74 kB 2.2 MB/s 
         |████████████████████████████████| 2.9 MB 11.8 MB/s 
         |████████████████████████████████| 112 kB 35.0 MB/s 
         |████████████████████████████████| 596 kB 46.5 MB/s 
      Building wheel for antlr4-python3-runtime (setup.py) ... done
    /content/silero-models
    ---------------------------------------------------------------------------
    OSError                                   Traceback (most recent call last)
    <ipython-input-1-5d873de0231f> in <module>()
         16 from glob import glob
         17 from omegaconf import OmegaConf
    ---> 18 from utils import (init_jit_model, 
         19                    split_into_batches,
         20                    read_audio,
    
    5 frames
    /usr/lib/python3.7/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
        362 
        363         if handle is None:
    --> 364             self._handle = _dlopen(self._name, mode)
        365         else:
        366             self._handle = handle
    
    OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory
    

    Thus, as a result, I am unable to run any examples - either locally or in Google Colab.

    Thanks!

    bug 
    opened by abhinavkulkarni 9
  • Bug report - running on ARM / RPI

    Bug report - running on ARM / RPI

    🐛 Bug

    I tried to use the model in a Raspberry PI 3B and i get the following error : fft: ATen not compiled with MKL support So i tried to modify the stft function in torch/functional.py to use the librosa stft instead, but it seems that the model use another torch stft instead of this i have on my package.

    The function used instead of torch stft

    def stft(input: Tensor, n_fft: int, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: Optional[Tensor] = None, center: bool = True, pad_mode: str = 'reflect', normalized: bool = False, onesided: Optional[bool] = None, return_complex: Optional[bool] = None): S = librosa.stft(np.array(input),n_fft,hop_length,win_length,window,center,pad_mode) s_real = np.real(S) s_real_shape = np.shape(s_real) s_real = np.reshape(s_real,(s_real_shape[0],s_real_shape[1],1)) s_imag = np.imag(S) s_imag_shape = np.shape(s_imag) s_imag = np.reshape(s_imag,(s_imag_shape[0],s_imag_shape[1],1)) S = np.concatenate((s_real,s_imag),axis=2) return torch.tensor(S)

    stack traces

    File "/home/Salim/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/stt_pretrained/models/model.py", line 27, in forward _2 = self.win_length _3 = torch.hann_window(self.n_fft, dtype=ops.prim.dtype(x), layout=None, device=ops.prim.device(x), pin_memory=None) x0 = torch.torch.functional.stft(x, _0, _1, _2, _3, True, "reflect", False, True, ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _4 = torch.slice(x0, 0, 0, 9223372036854775807, 1) _5 = torch.slice(_4, 1, 0, 9223372036854775807, 1) File "code/torch/torch/functional.py", line 21, in stft input0 = input print("test ok") _2 = torch.stft(input0, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~~~ <--- HERE return _2

    Traceback of TorchScript, original code (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 465, in stft input = F.pad(input.view(extended_shape), (pad, pad), pad_mode) input = input.view(input.shape[-signal_dim:]) return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~ <--- HERE RuntimeError: fft: ATen not compiled with MKL support

    Expected behavior

    Is it possible to modify the forward function that it will use the librosa stft for the raspberry PIs users ?

    Environment

    PyTorch version: 1.7.0a0+e85d494 Is debug build: True CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

    OS: Raspbian GNU/Linux 10 (buster) (armv7l) GCC version: (Raspbian 8.3.0-6+rpi1) 8.3.0 Clang version: Could not collect CMake version: version 3.13.4

    Python version: 3.7 (32-bit runtime) Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

    Versions of relevant libraries: [pip3] numpy==1.20.2 [pip3] numpydoc==0.7.0 [pip3] torch==1.7.0a0 [pip3] torchaudio==0.7.0a0+ac17b64 [pip3] torchvision==0.8.0a0+291f7e2 [conda] Could not collect

    bug 
    opened by Salim-alileche 9
  • Feature request - Offline use of model

    Feature request - Offline use of model

    At the moment it is nearly impossible to create a docker container that works offline (without internet access). Even if you include this line during docker build:

    RUN python -c "import torch; torch.backends.quantized.engine='qnnpack'; torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_te', force_reload=True)"

    During execution of the docker container (without internet) you load it locally:

    torch.hub.load(repo_or_dir='/root/.cache/torch/hub/snakers4_silero-models_master', model='silero_te', source='local', force_reload=False)

    Then you have the problem that the hubconf.py is called again (and fails due to no internet access) and it tries to download the files in hubconf.py Lines 21, 49, 101, even though they already exist.

    So my suggestion would be to also includes checks in the Lines 21,49,101 to check if the file already exists locally and if yes then skip it (like done in Line 114)

    Any reasons against that?

    enhancement 
    opened by Phil1108 7
  • Issue getting from silero model tried for text enhancement

    Issue getting from silero model tried for text enhancement

    Issue

    File "<torch_package_104>.release_module.py", line 122, in enhance_text File "<torch_package_104>.release_module.py", line 101, in enhance_long_textblock File "<torch_package_104>.release_module.py", line 72, in enhance_textblock File "<torch_package_104>.release_module.py", line 165, in enhance_tokens IndexError: string index out of range

    Details

    I added punctuation to the text using Silero models over the PyTorch hub, and everything was going smoothly until the attached text example appeared. I have no idea why this is occurring. I'm using this model to add punctuation to transcripts that I collect from YouTube; some of them have a few missing punctuation marks (supplied by the video author), while others have no punctuation at all (auto-generated by youtube).

    Transcript throwing Error

    transcript 1: ""Hey there. How's it going everybody in this video? We'll be learning about python Data types and specifically We'll be learning about how to work with textual data and textual data in python are represented with strings So we currently have [opened] our intro pi file that we were working with in the last video Where we just printed out hello world and I'll go ahead and run this so that we can see that down here It does print out hello [world] [now] This line here is using the print function and we're passing this text value into that print function now if we wanted to create a Variable that holds that text value then we could say now I'll just get rid of this comment for now So if I wanted a variable to hold that value then I can just create a variable and we'll call that"

    transcript 2: "you're now ready to see how to go one layer of a convolution on your network let's go through the example you've seen in the previous video how to take a 3d volume and convolve it with say two different filters in order to get in this example two different 4x4 outputs so let's say convolving with the first filter gives this first 4x4 output and convolving with this second filter gives a different 4x4 output the final thing to turn this into a convolutional neural net layer is that for each of these we're going to add it bias so this is going to be a real number and what - broadcasting you kind of had the same number - every you know one of these sixteen elements and then apply a non-linearity which for illustration that says there a luna mini arity and this gives you a 4x4 output after applying the bias and the non-linearity and then for this thing at the bottom as well you had some different buyers again this is a real number so you had the same row number - all 16 numbers and then applies some non-linearity that fairly non-linearity and this gives you a different 4x4 output then same as we did before if you take this and stack it up as follows so they end up with a 4 by 4 by 2 output then this computation where you've gone from 6 by 6 by 3 to a 4 by 4 by 4 this is one layer of a convolutional neural network Center mapped is back to one layer of for propagation in the standard neural network when a non convolutional neural network remember that one step afford prot was something like this right z1 equals w1 times a0 a0 was also equal to X right and then plus b1 and he applied the non-linearity to get a 1 so that's G of Z 1" Please review the above transcript that is and let us know what the problem is.

    opened by Kishan-Sahu 6
  • Model getting stuck on some texts.

    Model getting stuck on some texts.

    There hasn't been a debugging message to explain why the model keeps getting stuck for a very long period. Please assist us in adding a debugging message to the model so we can identify the cause of the problem.

    The text for which the model stuck is given below:

    Text: "we're going to set this by saying export Python path all uppercase and then equals and now we want to set that location so I'm just going to come over here and grab that location and paste that in those quotes and we want it to look just like that no space in between the equals and the path so to save that we can just hit ctrl X and then Y to save and then enter to keep the same file name and now we can either restart our terminal or run a source command on that file but I'll just restart the terminal here and pull this up and now if we run Python then let's see if we can import that module so import my module and we can see that that worked and the reason that worked is that if we import sis and look at our sis then we can see that after our current directory that we have the directory that was added there and the reason that it's added is that we added it to our Python path environment variable so now let's take a look at how to set"

    I manually tested by eliminating strange letters and words and discovered that removing "ctrl" from text, worked effectively.

    opened by Kishan-Sahu 2
  • Feature request - `<phoneme>` support for SSML

    Feature request - `` support for SSML

    🚀 Feature

    Allow phonetic pronunciation for necessary words

    Motivation

    Sometimes it's necessary to customize pronunciation of words with non-standard spelling or word borrowed from other languages. In that case having transcription in IPA or X-SAMPA would be nice (see e.g. Polly for explanation of the syntax)

    Pitch

    Wrapping IPA or X-SAMPA transcription into a <phoneme> tag makes the engine pronounce the word according to its specification.

    Alternatives

    Not sure if there are any within the project. Using other projects supporting <phoneme> is possible.

    Additional context

    enhancement 
    opened by lagleki 1
  • Packaging and PyPI releases

    Packaging and PyPI releases

    Hello,

    Thank you for your hard work.

    Is there any chance of getting installable Python package from PyPI for the project?

    For example, it might look like this for installing STT models with PyTorch:

    pip install silero-models-stt[torch]
    

    This would be very handy for using the models in the production projects and environments.

    help wanted 
    opened by espdev 9
  • Feature request - [Wake Word Detection]

    Feature request - [Wake Word Detection]

    🚀 Feature

    It would be helpful if we could easily use wake word detection to complement the STT functionality. At present I'm using a third-party tool for wake word detection which then records audio for 4 seconds which is processed through silero for home automation purposes.

    Motivation & Pitch

    Adding a simple method for custom wake word detection would allow seamless integration for the purposes of home automation where an always listening device waits for a given wake word or phrase and then listens for a sentence for STT purposes, the text of which is then passed on to a different step in the chain.

    Additionally, while waiting a fixed amount of time for the follow-up sentence is straight-forward, it would be a helpful addition to also use the length of silence in a sentence to determine its termination.

    Alternatives

    Theses things can be done at present, but by having to use multiple tools. Being able to do this in one place would make this use case seamless and easier to process.

    I do understand if this is too far outside of your scope for this project.

    enhancement 
    opened by waytotheweb 1
Releases(v0.4.1)
  • v0.4.1(Jun 12, 2022)

    What's Changed

    • Fix models.yml loading by @rominf in https://github.com/snakers4/silero-models/pull/162

    New Contributors

    • @rominf made their first contribution in https://github.com/snakers4/silero-models/pull/162

    Full Changelog: https://github.com/snakers4/silero-models/compare/v0.4...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4(Jun 6, 2022)

    What's Changed

    • Add version 3.1 by @Islanna in https://github.com/snakers4/silero-models/pull/157
    • Fx by @Islanna in https://github.com/snakers4/silero-models/pull/158
    • Fx by @Islanna in https://github.com/snakers4/silero-models/pull/159

    Full Changelog: https://github.com/snakers4/silero-models/compare/v0.3...v0.4

    Source code(tar.gz)
    Source code(zip)
  • v0.3(May 23, 2022)

    What's Changed

    • Testing the auto-build functionality
    • Update examples by @snakers4 in https://github.com/snakers4/silero-models/pull/137
    • Fx ssml and model loading by @Islanna in https://github.com/snakers4/silero-models/pull/140
    • Update README.md by @Islanna in https://github.com/snakers4/silero-models/pull/138
    • Tts v3 by @Islanna in https://github.com/snakers4/silero-models/pull/141

    Full Changelog: https://github.com/snakers4/silero-models/compare/v0.1...v0.2

    Source code(tar.gz)
    Source code(zip)
  • v0.1(Feb 28, 2022)

  • v1(Sep 16, 2020)

    header)

    Mailing list : test Mailing list : test License: CC BY-NC 4.0

    We publish the following models in this release:

    • English V1
    • German V1
    • Spanish V1

    | | PyTorch | ONNX | TensorFlow | Quantization | Quality | Colab | |-----------------|--------------------|--------------------|--------------------|--------------|---------|-------| | English (en_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab | | German (de_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab | | Spanish (es_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab |

    Source code(tar.gz)
    Source code(zip)
Owner
Alexander Veysov
Alexander Veysov
BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.

Benjamin Heinzerling 1.1k Jan 3, 2023
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 7, 2022
Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

MT5_paddle Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer English | 简体中文 mT5: A Massively

null 2 Oct 17, 2021
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Snm Logic 1 Dec 20, 2021
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Jacob Zhou 6 Apr 29, 2022
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

null 44 Dec 31, 2022
Must-read papers on improving efficiency for pre-trained language models.

Must-read papers on improving efficiency for pre-trained language models.

Tobias Lee 89 Jan 3, 2023
The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models Graformer (also named BridgeTransformer in t

null 22 Dec 14, 2022
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries.

THUNLP 2.3k Jan 8, 2023
Chinese Pre-Trained Language Models (CPM-LM) Version-I

CPM-Generate 为了促进中文自然语言处理研究的发展,本项目提供了 CPM-LM (2.6B) 模型的文本生成代码,可用于文本生成的本地测试,并以此为基础进一步研究零次学习/少次学习等场景。[项目首页] [模型下载] [技术报告] 若您想使用CPM-1进行推理,我们建议使用高效推理工具BMI

Tsinghua AI 1.4k Jan 3, 2023
Laboratory for Social Machines 84 Dec 20, 2022
Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

Vincent Hellendoorn 947 Dec 28, 2022
Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

Habib Abdurrasyid 5 Dec 28, 2021
This is a really simple text-to-speech app made with python and tkinter.

Tkinter Text-to-Speech App by Souvik Roy This is a really simple tkinter app which converts the text you have entered into a speech. It is created wit

Souvik Roy 1 Dec 21, 2021
Google and Stanford University released a new pre-trained model called ELECTRA

Google and Stanford University released a new pre-trained model called ELECTRA, which has a much compact model size and relatively competitive performance compared to BERT and its variants. For further accelerating the research of the Chinese pre-trained model, the Joint Laboratory of HIT and iFLYTEK Research (HFL) has released the Chinese ELECTRA models based on the official code of ELECTRA. ELECTRA-small could reach similar or even higher scores on several NLP tasks with only 1/10 parameters compared to BERT and its variants.

Yiming Cui 1.2k Dec 30, 2022
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Tanuj Sur 4 Jul 1, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

InstaDeep Ltd 72 Dec 9, 2022
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.

null 117 Jan 7, 2023