A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Overview

Simple-Vosk

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk GitHub page for the original API (documentation + support for other languages).

This module was created to make using a simple implementation of Vosk very quick and easy. It is intended for rapid prototyping and experimenting; not for production use.

For example, I used this module in a quick personal-assistant program.

Features

  • Uses Vosk: lightweight, multilingual, offline, and fast speech recognition.
  • Runs in background thread (non-blocking).
  • Both complete-sentence and real-time outputs.
  • Optional speaker-recognition (using X-Vectors).
  • Configurable filter-phrase list (eliminate common false outputs).

Requirements

Should work with Python 3.6+. Tested with Python 3.8.7 on Windows 10 1903.

Python Modules: (see requirements.txt)

  • vosk
  • sounddevice
  • numpy

You will also need to download Vosk models; one for your language of choice, and (if desired) the speaker-recognition model. Both can be found on the Vosk models page. If you don't use speaker recognition, you only need the one model.

Examples

This repository contains some examples of usage; ExampleSimpleDictation.py, ExampleSpeakerRecognition.py, and ExampleNonBlocking.py. Check the Documentation.md file for more in-depth info.

Below is the simplest implementation to get a fully-functioning speech-recognition system.

import simpleVosk as sv

def prnt(txt, spk, full):
	print(txt)

s = sv.Speech(callback=prnt, model="model")
s.run(blocking=True)

Troubleshooting

Make sure your default input device is working, and/or ensure you are passing the correct DeviceID to the Speech object. You can see device IDs with the listDevices() method in simpleVosk.py.

Make sure you have Windows microphone access enabled. Having this disabled can cause errors similar to this: sounddevice.PortAudioError: Error opening RawInputStream: Unanticipated host error [PaErrorCode -9999]: 'Undefined external error.' [MME error 1]

A Note on Conventions

This project goes against some standard Python conventions:

  • It uses camelCase for naming methods (and files) rather than snake_case
  • Tabs are used rather than 4 spaces for indentation (as I am a sane human being)
  • Non-standard docstring formats are being used

Future Plans

  • Add ability to add custom words/phrases (KaldiRecognizer appears to only accept replacement dictionaries)
  • Use proper docstrings
You might also like...
Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).
Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统,包含语音编码器、语音合成器、声码器和可视化模块。

Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation This repository contains the implementation of the following paper: Live Speech

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

English | 中文 Features 🌍 Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc. ?

Comments
  • How to install/import

    How to install/import

    Hi,

    I am new to python and linux. So please accept my appologise if this question appears a piece of cake to you.

    In the examples to run this package, you have just written this

    
    import simpleVosk as sv
    def prnt(txt, spk, full):
    	print(txt)
    
    s = sv.Speech(callback=prnt, model="model")
    s.run(blocking=True) 
    

    But the point is how to install via pip or import your package. I tried to install via pip and github but there were certain errors.

     pip install git+https://github.com/Kenneract/Simple-Vosk.git
    Defaulting to user installation because normal site-packages is not writeable
    Collecting git+https://github.com/Kenneract/Simple-Vosk.git
      Cloning https://github.com/Kenneract/Simple-Vosk.git to /tmp/pip-req-build-zdj6s7jm
      Running command git clone --filter=blob:none --quiet https://github.com/Kenneract/Simple-Vosk.git /tmp/pip-req-build-zdj6s7jm
      Resolved https://github.com/Kenneract/Simple-Vosk.git to commit fd4129cbb6a094286b083c5a39582040614923e8
    ERROR: git+https://github.com/Kenneract/Simple-Vosk.git does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.
    

    Hope I have cleared my query.

    Thanks

    opened by furqan915 1
Owner
null
Python functions for summarizing and improving voice dictation input.

Helpmespeak Help me speak uses Python functions for summarizing and improving voice dictation input. Get started with OpenAI gpt-3 OpenAI is a amazing

Margarita Humanitarian Foundation 6 Dec 17, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

Keon Lee 142 Jan 6, 2023
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented. Mostly I would recommend giving a quick look to the figures beyond the introduction.

Corentin Jemine 38.5k Jan 3, 2023
Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

Habib Abdurrasyid 5 Dec 28, 2021
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Snm Logic 1 Dec 20, 2021
voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Command-line tools for speech and intent recognition on Linux

Michael Hansen 988 Jan 4, 2023