A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Last update: Jun 19, 2022

Related tags

Text Data & NLP Simple-Vosk

Overview

Simple-Vosk

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk GitHub page for the original API (documentation + support for other languages).

This module was created to make using a simple implementation of Vosk very quick and easy. It is intended for rapid prototyping and experimenting; not for production use.

For example, I used this module in a quick personal-assistant program.

Features

Uses Vosk: lightweight, multilingual, offline, and fast speech recognition.
Runs in background thread (non-blocking).
Both complete-sentence and real-time outputs.
Optional speaker-recognition (using X-Vectors).
Configurable filter-phrase list (eliminate common false outputs).

Requirements

Should work with Python 3.6+. Tested with Python 3.8.7 on Windows 10 1903.

Python Modules: (see requirements.txt)

vosk
sounddevice
numpy

You will also need to download Vosk models; one for your language of choice, and (if desired) the speaker-recognition model. Both can be found on the Vosk models page. If you don't use speaker recognition, you only need the one model.

Examples

This repository contains some examples of usage; ExampleSimpleDictation.py, ExampleSpeakerRecognition.py, and ExampleNonBlocking.py. Check the Documentation.md file for more in-depth info.

Below is the simplest implementation to get a fully-functioning speech-recognition system.

import simpleVosk as sv

def prnt(txt, spk, full):
	print(txt)

s = sv.Speech(callback=prnt, model="model")
s.run(blocking=True)

Troubleshooting

Make sure your default input device is working, and/or ensure you are passing the correct DeviceID to the Speech object. You can see device IDs with the listDevices() method in simpleVosk.py.

Make sure you have Windows microphone access enabled. Having this disabled can cause errors similar to this: sounddevice.PortAudioError: Error opening RawInputStream: Unanticipated host error [PaErrorCode -9999]: 'Undefined external error.' [MME error 1]

A Note on Conventions

This project goes against some standard Python conventions:

It uses camelCase for naming methods (and files) rather than snake_case
Tabs are used rather than 4 spaces for indentation (as I am a sane human being)
Non-standard docstring formats are being used

Future Plans

Add ability to add custom words/phrases (KaldiRecognizer appears to only accept replacement dictionaries)
Use proper docstrings

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统，包含语音编码器、语音合成器、声码器和可视化模块。

6 Nov 8, 2022

Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

11 Nov 17, 2022

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers

easySpeech easySpeech is an open source python wrapper for google speech to text api that doesn't require PyAaudio(So you specially windows user don't

14 May 24, 2022

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

25.6k Jan 6, 2023

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation This repository contains the implementation of the following paper: Live Speech

575 Dec 31, 2022

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

English | 中文 Features 🌍 Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc. ?

25.6k Dec 31, 2022

Comments

How to install/import

Hi,

I am new to python and linux. So please accept my appologise if this question appears a piece of cake to you.

In the examples to run this package, you have just written this


import simpleVosk as sv
def prnt(txt, spk, full):
	print(txt)

s = sv.Speech(callback=prnt, model="model")
s.run(blocking=True)

But the point is how to install via pip or import your package. I tried to install via pip and github but there were certain errors.

 pip install git+https://github.com/Kenneract/Simple-Vosk.git
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/Kenneract/Simple-Vosk.git
  Cloning https://github.com/Kenneract/Simple-Vosk.git to /tmp/pip-req-build-zdj6s7jm
  Running command git clone --filter=blob:none --quiet https://github.com/Kenneract/Simple-Vosk.git /tmp/pip-req-build-zdj6s7jm
  Resolved https://github.com/Kenneract/Simple-Vosk.git to commit fd4129cbb6a094286b083c5a39582040614923e8
ERROR: git+https://github.com/Kenneract/Simple-Vosk.git does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.

Hope I have cleared my query.

Thanks

opened by furqan915 1

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Related tags

Overview

Simple-Vosk

Features

Requirements

Examples

Troubleshooting

A Note on Conventions

Future Plans

You might also like...

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Speech Recognition for Uyghur using Speech transformer

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

Comments

How to install/import

Owner

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Python functions for summarizing and improving voice dictation input.

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

Simple Speech to Text, Text to Speech

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux