voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Michael Hansen

Last update: Jan 4, 2023

Related tags

Text Data & NLP voice2json

Overview

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux. It is free, open source (MIT), and supports 17 human languages.

From the command-line:

$ voice2json transcribe-wav \
      < turn-on-the-light.wav | \
      voice2json recognize-intent | \
      jq .

produces a JSON event like:

{
    "text": "turn on the light",
    "intent": {
        "name": "LightState"
    },
    "slots": {
        "state": "on"
    }
}

when trained with this template:

[LightState]
states = (on | off)
turn (){state} [the] light

voice2json is optimized for:

Sets of voice commands that are described well by a grammar
Commands with uncommon words or pronunciations
Commands or intents that can vary at runtime

It can be used to:

Add voice commands to existing applications or Unix-style workflows
Provide basic voice assistant functionality completely offline on modest hardware
Bootstrap more sophisticated speech/intent recognition systems

Supported speech to text systems include:

CMU's pocketsphinx
Dan Povey's Kaldi
Mozilla's DeepSpeech 0.6
Kyoto University's Julius

Unique Features

voice2json is more than just a wrapper around open source speech to text systems!

Training produces both a speech and intent recognizer. By describing your voice commands with voice2json's templating language, you get more than just transcriptions for free.
Re-training is fast enough to be done at runtime (usually < 5s), even up to millions of possible voice commands. This means you can change referenced slot values or add/remove intents on the fly.
All of the available commands are designed to work well in Unix pipelines, typically consuming/emitting plaintext or newline-delimited JSON. Audio input/output is file-based, so you can receive audio from any source.

Commands

print-profile - Print profile settings
train-profile - Generate speech/intent artifacts
transcribe-wav - Transcribe WAV file to text
transcribe-stream - Transcribe live audio stream to text
recognize-intent - Recognize intent from JSON or text
wait-wake - Listen to live audio stream for wake word
record-command - Record voice command from live audio stream
pronounce-word - Look up or guess how a word is pronounced
generate-examples - Generate random intents
record-examples - Generate and record speech examples
test-examples - Test recorded speech examples
show-documentation - Run HTTP server locally with documentation
print-downloads - Print profile file download information
print-files - Print user profile files for backup

Comments

Node-Red pallette plugin not showing and custom command not working

Hi guys, ok so I have installed the node-red plug in etc, I need to add an custom intent. On voice command it must trigger a node-red flow that has its own timer to run a relay for 20 to 40 seconds when invoked. I have added words as well as intent but does not seem to work. Also the Node-red plugin not available in my palette. Tutorial feels like there is parts missing.

Any pointers would be greatly appreciated. Many thanks for all the help thus far

opened by infinitymakerspace 7
Set locales for docker build

Docker cant be used for German Profiles as it gives asci decode errors while training. This is probably due to missing locales in the docker container.

opened by johanneskropf 6

Build from source - configure does not detect pocketsphinx installed

Configure command:

./configure VOICE2JSON_LANGUAGE=en VOICE2JSON_SPEECH=pocketsphinx --disable-precompiled-binaries

Configure summary:

voice2json configuration summary:

architecture: x86_64/amd64
prefix: /home/ubuntu/Downloads/voice2json/.venv
virtualenv: yes
language: en

wake:
  mycroft precise: yes (x86_64, prebuilt)

speech to text:
  pocketsphinx: no
  kaldi: yes (source)
  julius: no
  deepspeech: no

training:
  opengrm: yes (source)
  phonetisaurus: yes (source)
  kenlm: no

configure: creating ./config.status
config.status: creating Makefile
config.status: creating setup.py
config.status: creating voice2json.sh
config.status: creating voice2json.spec

I am on Ubuntu 18.04 LTS with pocketsphinx, libpocketsphinx3, and libpocketsphinx-dev installed

But if I do ./configure only, the summary is as follows:

voice2json configuration summary:

architecture: x86_64/amd64
prefix: /home/ubuntu/Downloads/voice2json/.venv
virtualenv: yes
language: 

wake:
  mycroft precise: yes (x86_64, prebuilt)

speech to text:
  pocketsphinx: yes (source)
  kaldi: yes (prebuilt)
  julius: yes (prebuilt)
  deepspeech: yes (amd64, prebuilt)

training:
  opengrm: yes (prebuilt)
  phonetisaurus: yes (prebuilt)
  kenlm: yes (prebuilt)

configure: creating ./config.status
config.status: creating Makefile
config.status: creating setup.py
config.status: creating voice2json.sh
config.status: creating voice2json.spec

I wanted to build from source with pocketsphinx only and the former seems to include kaldi instead of pocketsphinx. If I remove kaldi from my system, voice2json generates error that kaldi is missing

opened by ekawahyu 5

Raspberry Pi Docker Image - USB Audio issues

Hi guys, thanks for all the help thus far. I am at a point where I test with transcribe stream. I am using a USB sound card and its set as default on the Raspberry Pi in Alsamixer.

When running the voice2json transcribe-stream, I am reveiving this response.

pi@raspberrypi:~ $ voice2json transcribe-stream ALSA lib confmisc.c:767:(parse_card) cannot find card '0' ALSA lib conf.c:4568:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings ALSA lib conf.c:4568:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory ALSA lib confmisc.c:1246:(snd_func_refer) error evaluating name ALSA lib conf.c:4568:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5047:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2564:(snd_pcm_open_noupdate) Unknown PCM default arecord: main:828: audio open error: No such file or directory

I have seen an issue that is still open and looked at that and ran the following , with no joy... pi@raspberrypi:~ $ voice2json transcribe-stream --device /dev/snd:/dev/snd usage: voice2json [-h] [--profile PROFILE] [--base-directory BASE_DIRECTORY] [--certfile CERTFILE] [--keyfile KEYFILE] [--setting SETTING SETTING] [--machine MACHINE] [--debug] {print-version,print-profile,print-downloads,print-files,train-profile,transcribe-wav,transcribe-stream,recognize-intent,record-command,wait-wake,pronounce-word,generate-examples,record-examples,test-examples,show-documentation,speak-sentence} ... voice2json: error: unrecognized arguments: --device /dev/snd:/dev/snd pi@raspberrypi:~ $

somehow the docker image isnt working from the usb sound device, but I am slightly lost.

opened by infinitymakerspace 5
Error while using transcribe-stream

Hello All, after training the profile following the getting started guide, I am trying to run transcribe-stream, but I am getting the following error:

ALSA lib pcm_hw.c:1822:(_snd_pcm_hw_open) Invalid value for card arecord: main:828: audio open error: No such file or directory

What can be the issue? I have the correct hardware card stored in a .asoundrc file. Is there any other option I can give to voice2json to use the proper audio device?

Thanks

opened by arnamoy10 5
Update DeepSpeech to v0.9.3

Hi, awesome project :) As the newer DeepSpeech models are so much better, is there a way to update to the current version?

Or would you recommend using Rhasspy?
enhancement

opened by solhuebner 4

transcribe-stream -a not working from input file / stdin

Running the following results in a no-op on both 2.0 and latest:

voice2json transcribe-stream -a etc/test/what_time_is_it.wav --wav-sink streamtest.wav --event-sink streamtest.log

The resulting wav-sink is hiccup-y noise, and the event sink is:

{"type": "speech", "time": 0.06}
{"type": "silence", "time": 0.24}
{"type": "speech", "time": 1.4400000000000008}
{"type": "silence", "time": 1.620000000000001}
{"type": "speech", "time": 8.459999999999981}
{"type": "silence", "time": 8.639999999999983}
{"type": "speech", "time": 8.759999999999984}
{"type": "started", "time": 9.059999999999986}
{"type": "silence", "time": 10.439999999999998}
{"type": "stopped", "time": 11.760000000000009}
{"type": "speech", "time": 0.18}
{"type": "started", "time": 0.48}
{"type": "silence", "time": 0.54}
{"type": "speech", "time": 1.0200000000000005}
{"type": "silence", "time": 4.859999999999998}
{"type": "stopped", "time": 5.459999999999994}
{"type": "speech", "time": 0.54}
{"type": "started", "time": 0.8400000000000003}
{"type": "silence", "time": 1.560000000000001}
{"type": "stopped", "time": 3.5400000000000027}
{"type": "speech", "time": 4.56}
{"type": "silence", "time": 4.859999999999998}

Thanks again for your hard work on voice2json! 🙂

opened by lukifer 4

audio-source - for transcribe-stream ?

Hello @synesthesiam, and thanks for your amazing work !

I am trying to stream from MQTT to transcribe-stream, but I can't.

When I try to transcribe-stream from stdin :

sox -t wav /tmp/test.wav -t wav - | /usr/bin/voice2json --debug transcribe-stream --audio-source -

I get that :

AttributeError: 'NoneType' object has no attribute 'stdout'

but I don't understand when I spoke about stdout ?

Regards,

Romain

opened by farfade 4

Install error using "sudo apt install voice2json_2.0_armhf.deb" - E: Unsupported file /pi/voice2json_2.0_armhf.deb given on commandline

Following the directions to install the 'deb' I ran into two issues

the documentation says Next, download the appropriate .deb file for your CPU architecture:

amd64 - Desktops, laptops, and servers
armhf - Raspberry Pi 2, and 3 (armv7)
arm64 - Raspberry Pi 3+, 4
armel - Raspberry Pi 0, 1

I have a Raspberry Pi 3 Model B Plus Rev 1.3 but when I run

dpkg-architecture | grep DEB_BUILD_ARCH=

I get: DEB_BUILD_ARCH=armhf

running the command: sudo apt install voice2json_2.0_armhf.deb results in

Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package voice2json_2.0_armhf.deb
E: Couldn't find any package by glob 'voice2json_2.0_armhf.deb'
E: Couldn't find any package by regex 'voice2json_2.0_armhf.deb'

After much digging I tried sudo dpkg -i voice2json_2.0_armhf.deb this ran but I got the following:

Selecting previously unselected package voice2json.
(Reading database ... 49992 files and directories currently installed.)
Preparing to unpack voice2json_2.0_armhf.deb ...
Unpacking voice2json (2.0.1) ...
dpkg: dependency problems prevent configuration of voice2json:
 voice2json depends on espeak; however:
  Package espeak is not installed.
 voice2json depends on jq; however:
  Package jq is not installed.
 voice2json depends on libportaudio2; however:
  Package libportaudio2 is not installed.
 voice2json depends on libatlas3-base; however:
  Package libatlas3-base is not installed.

dpkg: error processing package voice2json (--install):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 voice2json

Do I need to install espeak, jq, libportaudio2 and libatlas3-base and if so, this should be in the install notes.

opened by juggledad 3

Query Json with python

Is there a way to use input from voice2json to trigger action with pyhton script on raspberry pi?

Node Red is a mess and not working smooth and no simple examples.

Regards Gert

opened by infinitymakerspace 3
getting 404's when trying to download Spanish profiles

I am trying to get the spanish profiles with voice2json --profile es download-profile, and I'm getting 404's when it attempts to download them from github.

All models fail except for the pocketsphinx one, and I was able to download the default english one too. I have attached the errors I get when trying to download the default "es" profile.

404.txt

(Also: I'm using the latest Docker version)
bug

opened by Sondeluz 2

Could not find a version that satisfies the requirement rhasspynlu (from versions: none)

./bin/voice2json 
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/data1/protected/Programming/git/voice2json/voice2json/__main__.py", line 26, in <module>
    from .pronounce import pronounce
  File "/home/data1/protected/Programming/git/voice2json/voice2json/pronounce.py", line 15, in <module>
    import rhasspynlu
ModuleNotFoundError: No module named 'rhasspynlu'
/home/admin/Programming/git/voice2json
$ pip install rhasspynlu
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement rhasspynlu (from versions: none)
ERROR: No matching distribution found for rhasspynlu
/home/admin/Programming/git/voice2json
$

opened by gnusupport 0

Output contains "doors"

Heyho,

I'm running voice2json via docker on an M1 Mac. I used multiple .wav files, all produced by Davinci resolve, all in English in perfect audio quality. I can't upload the .wav files directly, but the episodes are published via .mp3 here. And every time I get an output with something regarding doors and lights... I'm very confused :D

{"text": "off open green open the living set on off hot set me door the door set to set temperature open the green open hot open living room lamp whats lamp hot how tell tell lamp set living turn is it door open set tell the set to is garage door open is it living me it whats it to red blue whats the temperature living blue me cold is it lamp off the living set cold make set lamp me whats door how hot is red on whats how off it turn off tell whats how whats turn the living what off garage light red living off is how on how turn on the living turn time living open to the on whats how lamp set to whats set what blue off closed whats the temperature is it living make room lamp whats me tell lamp cold room on time on whats room on off open door closed garage door open set turn off on whats the on time open make set on red the on living the what is it cold hot on on light to light to how blue green set living closed garage whats to the off the is light tell make bedroom light blue whats turn off tell door whats blue set living make the living room lamp the off red is lamp whats set living room lamp how temperature on the is is the time to off make the is is it open on cold it how hot on the the open closed living tell me on whats light to open closed red cold open cold is is what door it lamp cold the turn set garage make garage garage is cold bedroom living how on the open cold is on to living turn off open what turn off off hot is the door closed living garage whats red the me set the garage on the what is it green how blue off off whats time light the is on living garage light is it on turn off light it lamp turn it living room lamp off the whats it on living cold is the garage door set on living how the", "likelihood": 1, "transcribe_seconds": 9.57908892100022, "wav_seconds": 105.6426875, "tokens": null}

Do you have any idea what the problem could be? Thank you! Luka

opened by LukaHarambasic 0

GLIBC_2.28 needed

Setting up libc6:amd64 (2.27-3ubuntu1.6) ...
Setting up libc6:i386 (2.27-3ubuntu1.6) ...
Setting up libc6-i386 (2.27-3ubuntu1.6) ...
Setting up libc-dev-bin (2.27-3ubuntu1.6) ...
Setting up libc6-dev:amd64 (2.27-3ubuntu1.6) ...
Setting up libc6-dbg:amd64 (2.27-3ubuntu1.6) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for libc-bin (2.27-3ubuntu1.4) ...
hbarnard@hbarnard-OptiPlex-9020:~/Downloads$ voice2json --help
/usr/lib/voice2json/usr/local/bin/python3: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by /usr/lib/voice2json/usr/local/bin/python3)

I'd guess this is partly from use of Linux Mint rather than Ubuntu? However, I'm not planning to try and upgrade because of the risks to the rest of the system. Any other possibilities? Docker downloads everything it needs but doesn't start a container, same problem perhaps?

opened by hbarnard 0

Possibility of improving Chinese speech recognition (speech to text)
I am using voice2json as a voice command recognition backend in my voice interaction mod for a video game. As a native Chinese speaker, I find voice2json's Chinese support rather limited:

voice2json does not perform Chinese word segmentation, which means that users must perform word segmentation in sentences.ini by themselves.

In order to use voice2json, my program had to do Chinese word segmentation when generating sentences.ini.

Pronunciation prediction doesn't seem to work at all. Any word that is not in the dictionary is completely unrecognizable.

In order not to lose any words in the sentence, my program splits any Chinese words that are not in base_dictionary.txt into individual Chinese characters, so that they are in the dictionary and voice2json can handle it.

No ability to deal with foreign languages. All English words appearing in the sentence seem to be discarded.

My program can't do anything about it. Any foreign words in the sentence can simply be discarded.

The only available PocketSphinx and CMU models have poor recognition performance, with recognition accuracy far lower than the Microsoft Speech Recognition API that comes with Windows, and much worse than the English kaldi model.

This has reached an unusable level for my program. I would recommend Chinese users to use the old Microsoft speech recognition engine.

However, one English user gave excellent feedback:

The new speech recognition is much better then default windows one, it gets conversations almost every time, and takes a fraction of the time.

This is also the same as my own test. I was impressed that the default en-us_kaldi-zamia model gave extremely accurate results in a very short time even when I spoke with a crappy foreign accent.

So about any possibility of improving Chinese speech recognition

Intelligent Tokenizer (Word Segmenter)

Here is a simple project for it: fxsjy/Jieba. I use it for my application and it works good (I used the .NET port of it).

A demo:

pip3 install jieba

test.py

# encoding=utf-8 import jieba strs=[ "我来到北京清华大学", "乒乓球拍卖完了", "中国科学技术大学", "他来到了网易杭研大厦", "小明硕士毕业于中国科学院计算所，后在日本京都大学深造" ] for str in strs: seg_list = jieba.cut(str) print(' '.join(list(seg_list)))

Result:

Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Loading model cost 0.458 seconds. Prefix dict has been built successfully. 我来到北京清华大学乒乓球拍卖完了中国科学技术大学他来到了网易杭研大厦小明硕士毕业于中国科学院计算所，后在日本京都大学深造

An HMM model will be used for new word prediction.

Pronunciation Prediction

Chinese pronunciation is character-based. The pronunciation of Chinese words is the concatenation of the pronunciation of each character.

So, split the unknown word into individual characters and get the pronunciation and splice it, and you have the pronunciation of the unknown word. This doesn't even require training a neural network.

I use this method in my program and it works well. If the word returned by jieba.cut() is not in base_dictionary.txt, I split it into a sequence of single Chinese characters.

日本京都大学 -> 日本京都大学 -> r iz4 b en3 j ing1 d u1 d a4 x ve2

Completely correct.

The only caveat is that some characters may have multiple pronunciations, and you need to take into account the possibility of each pronunciation when combining them. At this point, training a neural network is more advantageous. However, even without training a neural network, it is possible to generate pronunciations, which can be assumed to have equal probability for each pronunciation.

虎绿林 -> 虎绿林 -> (h u3 l v4 l in2 | h u3 l u4 l in2)

IPA pronunciation dictionary

I have one: https://github.com/SwimmingTiger/BigCiDian

phones.txt

lexicon.txt

Chao tone letters (IPA) are used to mark pitch.

This dictionary contains pronunciations of Chinese words and common English words.

Foreign language support

English words sometimes appear in spoken and written Chinese, and these words retain their English written form.

eg. 我买了一台Mac笔记本，用的是macOS，我用起来还是不习惯，等哪天给它装个Windows系统。

Therefore, Chinese speech recognition engines usually need to have the ability to process two languages at the same time. If an English word is encountered, it is processed according to English rules (including pronunciation prediction).

If it is a Chinese word or a compound word (such as "U盘", means USB Flash Drive), it will be processed according to Chinese rules.

For example, in word segmentation, English words cannot be split into individual characters.

It seems possible to train a model that includes both Chinese and English. Of course it might be convenient if voice2json supports model mixing - Combine pure Chinese model and pure English model into the same model - I don't know if it's technically possible.

Number to Words

Here is a complete C# implementation.

Finding or writing a well-rounded Python implementation doesn't seem that hard.

Audio Corpora

Mozilla Common Voice already has a big enough Chinese Audio Corpora:

https://commonvoice.mozilla.org/zh-CN/datasets

https://commonvoice.mozilla.org/zh-TW/datasets

https://commonvoice.mozilla.org/zh-HK/datasets

Convert between Simplified Chinese and Traditional Chinese

Traditional Chinese and Simplified Chinese are just different written forms of Chinese characters, their spoken language is the same.

https://github.com/SwimmingTiger/BigCiDian is a Simplified Chinese pronunciation dictionary (without traditional Chinese characters). So it may be easier to deal with converting all texts into Simplified Chinese.

https://github.com/yichen0831/opencc-python can do this very well.

test.py pip3 install opencc-python-reimplemented

from opencc import OpenCC cc = OpenCC('t2s') # convert from Traditional Chinese to Simplified Chinese to_convert = '開放中文轉換' converted = cc.convert(to_convert) print(converted)

Result: 开放中文转换

Convert it before tokenization (word segmentation).

Calling t2s conversion on Simplified Chinese has no side effects. So there is no need to detect before conversion.

Complete preprocessing pipeline for text

Convert Traditional to Simplified -> Number to Words -> Tokenizer (Word Segmentation) -> Convert to Pronunciation -> Unknown Word Pronunciation Prediction (Chinese and English may have different modes, handwritten code or neural network)

Why does the number-to-word appear before the tokenizer?

Because the output of number-to-word is also a Chinese sentence, there is no space separation between words.

Model Training

I want to train a Chinese kaldi model for voice2json. Maybe I can use the steps and tools of Rhasspy.

To train a Chinese model using https://github.com/rhasspy/ipa2kaldi, it looks like I need to add Chinese support to https://github.com/rhasspy/gruut.

If there is any progress, I will update here. Any suggestions are also welcome.
opened by SwimmingTiger 1
slow performance in raspberry

Hi!, i installed voice2json in a raspberry pi 3 model b, and it works really slow. I also have installed Rhasspy (docker version) in the raspberry and Rhasspy detects everything quite fast.

There is any recommended hardware or system to work with voice2json ?

Cheers!.

opened by ch-rigu 0

Releases(v2.1)

v2.1(Jun 3, 2021)
[2.1] - 3 Jun 2021

Added

download-profile command and auto-download of relevant language files from Github

New Kaldi profiles for Czech, English, Spanish, French, Dutch, Russian, and Swedish

New DeepSpeech profiles for German, French, and Italian

Changed

Upgrade DeepSpeech support to v0.9

--profile argument can now be a language name, profile name, or directory

Source code(tar.gz)
Source code(zip)
voice2json_2.1_amd64.deb(156.77 MB)
voice2json_2.1_arm64.deb(110.43 MB)
voice2json_2.1_armhf.deb(118.50 MB)
v2.0(Jun 4, 2020)

Source code(tar.gz)
Source code(zip)
voice2json_2.0_amd64.deb(129.13 MB)
voice2json_2.0_arm64.deb(81.79 MB)
voice2json_2.0_armel.deb(16.66 MB)
voice2json_2.0_armhf.deb(88.27 MB)
v2.0-beta(May 1, 2020)

Source code(tar.gz)
Source code(zip)
voice2json_2.0.0_amd64.deb(128.37 MB)
voice2json_2.0.0_arm64.deb(81.78 MB)
voice2json_2.0.0_armel.deb(16.65 MB)
voice2json_2.0.0_armhf.deb(88.26 MB)
v1.0(Nov 2, 2019)

Source code(tar.gz)
Source code(zip)
voice2json_1.0_aarch64.deb(73.61 MB)
voice2json_1.0_amd64.deb(78.23 MB)
voice2json_1.0_armhf.deb(72.76 MB)

Owner

Michael Hansen

Computer scientist, open source voice assistant enthusiast.

GitHub

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Simple-Vosk A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk G

2 Jun 19, 2022

Unsupervised intent recognition

INTENT author: steeve LAQUITAINE description: deployment pattern: currently batch only Setup & run git clone https://github.com/slq0/intent.git bash

1 Apr 8, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

11 Nov 17, 2022

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

1 Dec 20, 2021

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

Command Line Text-To-Speech using Google TTS

cli-tts Thanks to gTTS by @pndurette! This is an interactive command line text-to-speech tool using Google TTS. Just type text and the voice will be p

3 Nov 11, 2022

Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

159 Apr 4, 2022

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 9, 2023

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

3.2k Dec 31, 2022

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

247 Dec 26, 2022

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.

109 Dec 2, 2022

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

1.5k Dec 5, 2022

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

1.4k Feb 17, 2021

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

The PyTorch-Kaldi Speech Recognition Toolkit PyTorch-Kaldi is an open-source repository for developing state-of-the-art DNN/HMM speech recognition sys

2.3k Dec 27, 2022

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

919 Jan 3, 2023

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Related tags

Overview

Unique Features

Commands

Comments

So about any possibility of improving Chinese speech recognition

Intelligent Tokenizer (Word Segmenter)

Pronunciation Prediction

IPA pronunciation dictionary

Foreign language support

Number to Words

Audio Corpora

Convert between Simplified Chinese and Traditional Chinese

Complete preprocessing pipeline for text

Model Training

Releases(v2.1)

v2.1(Jun 3, 2021)

[2.1] - 3 Jun 2021

Added

Changed

v2.0(Jun 4, 2020)

v2.0-beta(May 1, 2020)

v1.0(Nov 2, 2019)

Owner

Michael Hansen

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Unsupervised intent recognition

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Speech Recognition for Uyghur using Speech transformer

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Command Line Text-To-Speech using Google TTS

Intent parsing and slot filling in PyTorch with seq2seq + attention

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Simple Speech to Text, Text to Speech

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit