Simple, hackable offline speech to text - using the VOSK-API.

Overview

Nerd Dictation

Offline Speech to Text for Desktop Linux. - See demo video.

This is a utility that provides simple access speech to text for using in Linux without being tied to a desktop environment.

Simple
This is a single file Python script with minimal dependencies.
Hackable
User configuration lets you manipulate text using Python string operations.
Zero Overhead
As this relies on manual activation there are no background processes.

Dictation is accessed manually with begin/end commands.

This uses the excellent vosk-api.

Usage

It is suggested to bind begin/end/cancel to shortcut keys.

nerd-dictation begin
nerd-dictation end

For details on how this can be used, see: nerd-dictation --help and nerd-dictation begin --help.

Features

Specific features include:

Numbers as Digits

Optional conversion from numbers to digits.

So Three million five hundred and sixty second becomes 3,000,562nd.

A series of numbers (such as reciting a phone number) is also supported.

So Two four six eight becomes 2,468.

Time Out
Optionally end speech to text early when no speech is detected for a given number of seconds. (without an explicit call to end which is otherwise required).
Output Type
Output can simulate keystroke events (default) or simply print to the standard output.
User Configuration Script
User configuration is just a Python script which can be used to manipulate text using Python's full feature set.

See nerd-dictation begin --help for details on how to access these options.

Dependencies

  • Python 3.
  • The VOSK-API.
  • parec command (for recording from pulse-audio).
  • xdotool command to simulate keyboard input.

Install

pip3 install vosk
git clone https://github.com/ideasman42/nerd-dictation.git
cd nerd-dictation
wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 model

To test dictation:

./nerd-dictation begin --vosk-model-dir=./model &
# Start speaking.
./nerd-dictation end
  • Reminder that it's up to you to bind begin/end/cancel to actions you can easily access (typically key shortcuts).

  • To avoid having to pass the --vosk-model-dir argument, copy the model to the default path:

    mkdir -p ~/.config/nerd-dictation
    mv ./model ~/.config/nerd-dictation

Hint

Once this is working properly you may wish to download one of the larger language models for more accurate dictation. They are available here.

Configuration

This is an example of a trivial configuration file which simply makes the input text uppercase.

# ~/.config/nerd-dictation/nerd-dictation.py
def nerd_dictation_process(text):
    return text.upper()

A more comprehensive configuration is included in the examples/ directory.

Hints

  • The processing function can be used to implement your own actions using keywords of your choice. Simply return a blank string if you have implemented your own text handling.
  • Context sensitive actions can be implemented using command line utilities to access the active window.

Paths

Local Configuration
~/.config/nerd-dictation/nerd-dictation.py
Language Model

~/.config/nerd-dictation/model

Note that --vosk-model-dir=PATH can be used to override the default.

Command Line Arguments

Output of nerd-dictation --help

usage:

nerd-dictation [-h]  ...

This is a utility that activates text to speech in Linux. While it could use any system currently it uses the VOSK-API.

positional arguments:

begin: Begin dictation.
end: End dictation.
cancel: Cancel dictation.
optional arguments:
-h, --help show this help message and exit

Subcommand: begin

usage:

nerd-dictation begin [-h] [--cookie FILE_PATH] [--vosk-model-dir DIR]
                     [--pulse-device-name IDENTIFIER]
                     [--sample-rate HZ] [--defer-output] [--continuous]
                     [--timeout SECONDS] [--idle-time SECONDS]
                     [--delay-exit SECONDS]
                     [--punctuate-from-previous-timeout SECONDS]
                     [--full-sentence] [--numbers-as-digits]
                     [--numbers-use-separator] [--output OUTPUT_METHOD]
                     [- ...]

This creates the directory used to store internal data, so other commands such as sync can be performed.

optional arguments:
-h, --help show this help message and exit
--cookie FILE_PATH
  Location for writing a temporary cookie (this file is monitored to begin/end dictation).
--vosk-model-dir DIR
  Path to the VOSK model, see: https://alphacephei.com/vosk/models
--pulse-device-name IDENTIFIER
  The name of the pulse-audio device to use for recording. See the output of "pactl list sources" to find device names (using the identifier following "Name:").
--sample-rate HZ
  The sample rate to use for recording (in Hz). Defaults to 44100.
--defer-output

When enabled, output is deferred until exiting.

This prevents text being typed during speech (implied with --output=STDOUT)

--continuous Enable this option, when you intend to keep the dictation process enabled for extended periods of time. without this enabled, the entirety of this dictation session will be processed on every update. Only used when --defer-output is disabled.
--timeout SECONDS
  Time out recording when no speech is processed for the time in seconds. This can be used to avoid having to explicitly exit (zero disables).
--idle-time SECONDS
  Time to idle between processing audio from the recording. Setting to zero is the most responsive at the cost of high CPU usage. The default value is 0.1 (processing 10 times a second), which is quite responsive in practice (the maximum value is clamped to 0.5)
--delay-exit SECONDS
  The time to continue running after an exit request. this can be useful so "push to talk" setups can be released while you finish speaking (zero disables).
--punctuate-from-previous-timeout SECONDS
  The time-out in seconds for detecting the state of dictation from the previous recording, this can be useful so punctuation it is added before entering the dictation(zero disables).
--full-sentence
  Capitalize the first character. This is also used to add either a comma or a full stop when dictation is performed under the --punctuate-from-previous-timeout value.
--numbers-as-digits
  Convert numbers into digits instead of using whole words.
--numbers-use-separator
  Use a comma separators for numbers.
--output OUTPUT_METHOD
 

Method used to at put the result of speech to text.

  • SIMULATE_INPUT simulate keystrokes (default).
  • STDOUT print the result to the standard output. Be sure only to handle text from the standard output as the standard error may be used for reporting any problems that occur.
- ... End argument parsing.
This can be used for user defined arguments which configuration scripts may read from the sys.argv.

Subcommand: end

usage:

nerd-dictation end [-h] [--cookie FILE_PATH]

This ends dictation, causing the text to be typed in.

optional arguments:
-h, --help show this help message and exit
--cookie FILE_PATH
  Location for writing a temporary cookie (this file is monitored to begin/end dictation).

Subcommand: cancel

usage:

nerd-dictation cancel [-h] [--cookie FILE_PATH]

This cancels dictation.

optional arguments:
-h, --help show this help message and exit
--cookie FILE_PATH
  Location for writing a temporary cookie (this file is monitored to begin/end dictation).

Details

  • Typing in results will never press enter/return.
  • Pulse audio is used for recording.
  • Recording and speech to text a performed in parallel.

Examples

Store the result of speech to text as a variable in the shell:

SPEECH="$(nerd-dictation begin --timeout=1.0 --output=STDOUT)"

Example Configurations

These are example configurations you may use as a reference.

Other Software

  • Elograf - nerd-dictation GUI front-end that runs as a tray icon.

Limitations

  • Text from VOSK is all lower-case, while the user configuration can be used to set the case of common words like I this isn't very convenient (see the example configuration for details).

  • For some users the delay in start up may be noticeable on systems with slower hard disks especially when running for the 1st time (a cold start).

    This is a limitation with the choice not to use a service that runs in the background. Recording begins before any the speech-to-text components are loaded to mitigate this problem.

Further Work

  • And a general solution to capitalize words (proper nouns for example).
  • Wayland support (this should be quite simple to support and mainly relies on a replacement for xdotool).
  • Add a setup.py for easy installation on uses systems.
  • Possibly other speech to text engines (only if they provide some significant benefits).
  • Possibly support Windows & macOS.
Comments
  • Packaging

    Packaging

    Hello, I have the idea to package nerd-dictation for Pypi.org. I tested adding a setup.py and setup.cfg file. Thus I tried to consider nerd-dictation file as a module, adding a console script entry. At this step, I'm facing a problem that the name nerd-dictation is not allowed because of the dash, the name generates syntax error with `import nerd-dictation". Can it be considered to change the name in nerd_dictation instead of nerd-dictation? I didn't yet explored another way to not use module/console script, but to install directly the nerd-dictation script. What do you think about that? The background idea is to distribute it with easy installation with pip install, and also that elograf can require it as dependency.

    opened by papoteur-mga 8
  • xdotool: freezes the OS

    xdotool: freezes the OS

    When I run the program by assigning the command "nerd-dictation begin --timeout 1 --numbers-as-digits --numbers-use-separator" to a custom Keyboard Shortcut on Ubuntu 20.04 it seems to be freezing every single time. Any fixes for this? It seems to behave like a memory leak, It completely crashes the OS.

    opened by 52617365 8
  • Add shell.nix and package vosk

    Add shell.nix and package vosk

    Greetings, started playing around with this the other day.. I run NixOS so I had to lay some groundwork first.. Thought others might appreciate it too.

    It just drops you into a nix-shell with the required packages so you can run nerd-dictation. It packages a couple of the English models. Easy enough to copy for other language models though. :)

    opened by mankyKitty 6
  • Is using nerd-dictation to control software a solved problem?

    Is using nerd-dictation to control software a solved problem?

    I want to use nerd-dictation for processing my photos, basically:

    • show photo
    • wait for command (next previous delete promote)
    • if command is detected: show what was detected (or produce sound feedback?), execute action

    I am not entirely sure what would be the best way to implement this - has anyone did something like that already? Seems a relatively obvious use of actually working voice-to-text.

    (maybe using nerd-dictation is a mistake and I should be using vosk API directly?)

    question 
    opened by matkoniecz 6
  • No keystrokes appear in LibreOffice Writer

    No keystrokes appear in LibreOffice Writer

    With some sort of recent upgrade either Ubuntu or LibreOffice the I have noticed that I cannot use nerd-dictation in LibreOffice writer. No text appears. nerd-dictation works fine with chrome or thunderbird windows. It did not used to be this way. I have upgraded from ubuntu 18 to 21.10 recently, so perhaps there was some sort of change with regard to that period maybe there's some sort of security policy that prevents simulated keystrokes? Just a guess. Libreoffice is 7.2.3.2.

    opened by xenotropic 5
  • Russian input lags entire interface

    Russian input lags entire interface

    Russian input lags entire interface. But some programs (Blender for example) don't lag at all (also Blender usually launched in fullscreen). English input works fine. Model: "vosk-model-small-ru-0.22"

    opened by scaledteam 5
  • What is the correct format for --pulse-device-name?

    What is the correct format for --pulse-device-name?

    First off - thank you. This is precisely what I have been looking for. Great work here!

    I want to ensure that the program is using the right microphone - I want to make sure it uses the external one, not the one on my laptop. Running pactl list gives me a WHOLE slew of stuff, but I think this is the chunk I'm most interested in, since it lists my external microphone:

    Card #2
    	Name: alsa_card.usb-BLUE_MICROPHONE_Blue_Snowball_201603-00
    	Driver: module-alsa-card.c
    	Owner Module: 28
    	Properties:
    		alsa.card = "1"
    		alsa.card_name = "Blue Snowball"
    		alsa.long_card_name = "BLUE MICROPHONE Blue Snowball at usb-0000:00:14.0-3, full speed"
    		alsa.driver_name = "snd_usb_audio"
    		device.bus_path = "pci-0000:00:14.0-usb-0:3:1.0"
    		sysfs.path = "/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3:1.0/sound/card1"
    		udev.id = "usb-BLUE_MICROPHONE_Blue_Snowball_201603-00"
    		device.bus = "usb"
    		device.vendor.id = "0d8c"
    		device.vendor.name = "C-Media Electronics, Inc."
    		device.product.id = "0005"
    		device.product.name = "Blue Snowball"
    		device.serial = "BLUE_MICROPHONE_Blue_Snowball_201603"
    		device.string = "1"
    		device.description = "Blue Snowball"
    		module-udev-detect.discovered = "1"
    		device.icon_name = "audio-card-usb"
    	Profiles:
    		input:mono-fallback: Mono Input (sinks: 0, sources: 1, priority: 1, available: yes)
    		input:multichannel-input: Multichannel Input (sinks: 0, sources: 1, priority: 1, available: yes)
    		off: Off (sinks: 0, sources: 0, priority: 0, available: yes)
    	Active Profile: input:mono-fallback
    	Ports:
    		analog-input-mic: Microphone (priority: 8700, latency offset: 0 usec)
    			Properties:
    				device.icon_name = "audio-input-microphone"
    			Part of profile(s): input:mono-fallback
    		multichannel-input: Multichannel Input (priority: 0, latency offset: 0 usec)
    			Part of profile(s): input:multichannel-input
    

    I have tried feeding the "Name" value (alsa_card.usb-BLUE_MICROPHONE_Blue_Snowball_201603-00), the udev.id, and the device.icon_name (longshot) into the CLI, each time getting the error Stream error: No such entity. If I don't include the --pulse-device-name, dictation works fine, but I want to ensure it's getting the best input possible.

    Which of the values from the pactl list output should we use for that flag? Or is there another value further up in the stream - i.e. not "Card #2' - that I should be looking at?

    Thanks!

    opened by vrrobz 5
  • pa_context_connect() failed: Connection refused

    pa_context_connect() failed: Connection refused

    Hi, I'm trying to run nerd-dictator on Kubutu 20.04. I created a virtualenv, activated it and installed vosk by pip3.

    I'm running newrd-dictation as root user and I get

    ./nerd-dictation begin --vosk-model-dir=./model &
    pa_context_connect() failed: Connection refused
    

    (the process still runs on background). What is causing this error? Am I missing something?

    If I try ti run it as normal user, I get some permission error:

      File "./nerd-dictation", line 1188, in <module>
        main()
      File "./nerd-dictation", line 1184, in main
        args.func(args)
      File "./nerd-dictation", line 1107, in <lambda>
        func=lambda args: main_begin(
      File "./nerd-dictation", line 747, in main_begin
        touch(path_to_cookie)
      File "./nerd-dictation", line 65, in touch
        os.utime(filepath, None)
    PermissionError: [Errno 13] Permission denied
    

    I tried to change the ownership of the main folder and model/ folder so they belong to my current user, but I still get the error. I notice the error mention a "path_to_cookie" but I have no idea of what path it could be.

    opened by sirio81 5
  • Lots of numbers being spit out

    Lots of numbers being spit out

    Thank you for writing this interesting project. It's running, but it's spitting out a lot of garbage along with the text.

    ❯ ./nerd-dictation begin 0.09997663497924805 0.09870014190673829 0.09955344200134278 0.09974346160888672 0.09971175193786622 0.0929502010345459 0.09946784973144532 0.09947595596313477 0.0925527572631836 0.09944138526916504 0.09245476722717286 0.09949836730957032 0.09236202239990235 0.09945592880249024 0.09939346313476563 0.0923090934753418 0.09901008605957032 THIS0.09907612800598145 IS0.039521551132202154 0.09932670593261719 0.09929046630859376 ANOTHER0.07741460800170899 0.09929213523864747 0.09936389923095704 TERRORIST0.015120840072631841 0.09926352500915528 ST0.09925565719604493 0.0896986484527588 0.09934697151184083 0.09947404861450196 0.09257588386535645 0.09938035011291504 0.09136066436767579 0.09934458732604981 0.06850967407226563 0.09943637847900391 0.09936747550964356 0.09154710769653321 0.09944114685058594 0.09195122718811036 0.09947142601013184

    How do I suppress all these logits?

    opened by MikeyBeez 5
  • 'huh' outputted after exiting

    'huh' outputted after exiting

    Hello, thanks for creating this project! Very cool.

    I've noticed that huh is outputted after I stop nerd-dictation from running. Maybe outputted twice? I'm using the small English model from the install instructions.

    image

    opened by makeworld-the-better-one 4
  • English text is out of order and includes extra characters

    English text is out of order and includes extra characters

    The characters are strangely out of order. Using the vosk-model-en-us-0.22-lgraph.zip model. Saying "This is a test of the emergency broadcast system" multiple times:

    $ ./nerd-dictation begin --vosk-model-dir=./model --timeout=1.0
    this i tstfesa  o the mycnegeer broactdas ysstem
    $ ./nerd-dictation begin --vosk-model-dir=./model --timeout=1.0
    tihs is  atesoft  theem ergencbortsy adca systme
    $ ./nerd-dictation begin --vosk-model-dir=./model --timeout=1.0
    this is a se oftt the mereg aorbnecscdyta system
    

    The Vosk API test_microphone.py works correctly:

    $ python3 test_microphone.py
    LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=6
    LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:11:12:13:14:15
    LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
    LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
    LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.089 seconds in looped compilation.
    LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from model/ivector/final.ie
    LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
    LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
    LOG (VoskAPI:ReadDataFiles():model.cc:281) Loading HCL and G from model/graph/HCLr.fst model/graph/Gr.fst
    LOG (VoskAPI:ReadDataFiles():model.cc:302) Loading winfo model/graph/phones/word_boundary.int
    ################################################################################
    Press Ctrl+C to stop the recording
    ################################################################################
    {
      "partial" : ""
    }
    <SNIP DUPLICATES>
    {
      "partial" : "this"
    }
    {
      "partial" : "this"
    }
    {
      "partial" : "this is"
    }
    {
      "partial" : "this is a"
    }
    {
      "partial" : "this is a test of"
    }
    {
      "partial" : "this is a test of"
    }
    {
      "partial" : "this is a test of the"
    }
    {
      "partial" : "this is a test of the emergency"
    }
    {
      "partial" : "this is a test of the emergency broadcast"
    }
    <SNIP DUPLICATES>
    {
      "partial" : "this is a test of the emergency broadcast system"
    }
    <SNIP DUPLICATES>
    {
      "text" : "this is a test of the emergency broadcast system"
    }
    {
      "partial" : ""
    }
    <SNIP DUPLICATES>
    ^C
    Done
    
    opened by 13rac1 4
  • What configuration script and nerd-dictation options did you use for your youtube video?

    What configuration script and nerd-dictation options did you use for your youtube video?

    I saw your video and wondered how you invoked nerd-dictation for that example? Did you have any special .py configuration or command line options?

    • https://www.youtube.com/watch?v=T7sR-4DFhpQ
    opened by KJ7LNW 0
  • PYNPUT support

    PYNPUT support

    hello i am so glad that you wrote the software (i wrote this message using nerd dictation come a however i do not have punctuation fixed yet exclamation mark)

    While i did not write the patch for PYNPUT below, i thought you might be interested emerging it since it is an additional input method that does not depend on external tooling:

    • https://github.com/ideasman42/nerd-dictation/compare/master...mklcp:nerd-dictation:master
    opened by KJ7LNW 0
  • Sentence capitalization and punctuation not working as in demo

    Sentence capitalization and punctuation not working as in demo

    First of all, thanks for creating this! I am very excited to see such an accurate, accessible and extendable voice typing solution on Linux!

    The youtube demo is very exciting and I would love to voice type with that accuracy. However, when I try to set it up myself, there seems to be no punctuation added. This is the command I am running:

    ./nerd-dictation begin --full-sentence --punctuate-from-previous-timeout 2 &
    

    The first sentence is capitalized but no matter how long I wait there is no punctuation added. I have tried with different models, and all produce the same result. I initially thought that there was some background noise keeping the mic alive, but the --timeout option works as expected so that can't be it. Adding --continuous makes each new sentence capitalize, but punctuation is still missing.

    After looking around a bit it seems like there is a bug that make the variable is_run_on never evaluate to true since age_in_seconds is always a really high number. However, even after manually setting is_run_on to True, I get only commas inserted unless I use the --full-sentence option in which case only periods are inserted. How can I include a mix of the two as in the demo video? Instructions of how to replicate the behavior in the video would be very helpful.

    Related issues:

    • https://github.com/ideasman42/nerd-dictation/issues/63
    • https://github.com/ideasman42/nerd-dictation/issues/59
    • https://github.com/ideasman42/nerd-dictation/pull/50 (but in the video there is no need to say "comma" etc)
    opened by joelostblom 0
  • Support for OpenAI Whisper

    Support for OpenAI Whisper

    I'm wondering if its on the roadmap to add support for OpenAI's whisper. It could possibly be done using a packaged docker container something like this: https://github.com/ahmetoner/whisper-asr-webservice/

    I assume at this point the results would be superior. Maybe failing that there could be some notes about how to use another backend for the voice to text?

    Cheers and great work - thank you

    opened by nkeilar 2
  • On/off script + tray icon

    On/off script + tray icon

    The included bash script (see below) can be linked to a desktop shortcut.

    1. it starts nerd-dictation in background
    2. and runs a py script which places a tray-icon on xfce panel. This allows to change the langage and stop nerd-dictation daemon

    ADAPT THE 2 PROGRAMS :

    1. nerd-command.sh :
      1.1) line 3 : The "cd " command in the bash script should be adapted according to user's folder organisation 1.2) line 12: The variable LANG set the langage loaded by the daemon

    2. nerd-tray.py : 2.1) Place model folders in user's .config directory. Model folders should be renamed "model_xx" (eg model_US or model_FR). 2.2) If you dont want to modify your organisation, adapt the line where the call to your model is made : "os.system("nerd-dictation begin --vosk-model-dir=$HOME/.config/nerd-dictation/model_" + current_label + " &")"

    NOTES :

    Rought but works for me. I use the bash script as an on/off button. XFCE allows to link a keyboard shortcut to this script making its use very confortable. Hope this helps. Best regards.


    nerd-command.sh

    #!/bin/bash
    
    LANG=US
    
    if [[ ! "$(ps -o ppid= -C parec)" == "" ]] 
    then 
      nerd-dictation end
      kill -9 $(ps aux|grep 'python nerd-tray.py'|grep -v grep |awk '{print $2}')
      notify-send "nerd-ended" 
    
    else 
      cd $HOME/Documents/dotfiles/backup/Scripts
      python nerd-tray.py $LANG &
      notify-send "nerd-started" 
    fi
    

    nerd-tray.py

    #!/usr/bin/env python3
    
    import os
    import gi
    import sys
    
    gi.require_version("Gtk", "3.0")
    gi.require_version("AppIndicator3", "0.1")
    gi.require_version('Notify', '0.7')
    from gi.repository import Gtk as gtk
    from gi.repository import AppIndicator3 as appindicator
    # from gi.repository import Notify as notify
    
    LAUNCHERS = [
        {
            "label": "US",
            "icon": "/usr/share/xfce4/xkb/flags/us.svg",
            "command": "setLang",
        },
        {
            "label": "FR",
            "icon": "/usr/share/xfce4/xkb/flags/fr.svg",
            "command": "setLang",
        },
        {
            "sep": True,
        },
        {
            "label": "Stop",
            "icon": None,
            "command": "stopLang",
        },
        {
            "label": "Exit",
            "icon": None,
            "command": "quit",
        },
    ]
    
    APPINDICATOR_ID = 'nerd-tray'
    
    class IconoTray:
        def __init__(self, appid, iconname):
            self.menu = gtk.Menu()
            self.ind = appindicator.Indicator.new(appid, iconname, appindicator.IndicatorCategory.APPLICATION_STATUS)
            self.ind.set_status (appindicator.IndicatorStatus.ACTIVE)
            self.ind.set_menu(self.menu)
            # notify.init(APPINDICATOR_ID)
            # notify.Notification.new(appid, "started", None).show()
    
        def add_menu_item(self, label=None, icon=None, command=None, sep=False):
            if sep :
                item = gtk.SeparatorMenuItem()
            elif icon == None :
                item = gtk.MenuItem()
                item.set_label(label)
            else :
                img = gtk.Image()
                img.set_from_file(icon)
                item = gtk.ImageMenuItem(label=label)
                item.set_image(img)
            if command != None :
                item.connect("activate", getattr(self, command))
            self.menu.append(item)
            self.menu.show_all()
    
        def setLang(self, source):
            current_label = source.get_label()
            os.system("nerd-dictation end")
            os.system("nerd-dictation begin --vosk-model-dir=$HOME/.config/nerd-dictation/model_" + current_label + " &")
            for item in self.menu:
                item.set_sensitive(item.get_label() != current_label)
            self.ind.set_icon('audio-recorder-on')
            return
    
        def stopLang(self, source):
            os.system("nerd-dictation end")
            for item in self.menu:
                item.set_sensitive(True)
            self.ind.set_icon('notification-microphone-sensitivity-high')
            return
    
        def selectItem(self, label):
            for item in self.menu:
                if item.get_label() == label:
                    self.setLang(item)
            return
    
        def quit(self, source):
            os.system("nerd-dictation end")
            # notify.Notification.new("nerd-dictation", "stopped.", None).show()
            # notify.uninit()
            gtk.main_quit()
    
    def main():
        app = IconoTray(APPINDICATOR_ID, "notification-microphone-sensitivity-high")
        for launcher in LAUNCHERS:
            app.add_menu_item(**launcher)
        if len(sys.argv) >= 2 :
            app.selectItem(sys.argv[1])
        gtk.main()
    
    if __name__ == "__main__":
        main()
    
    opened by mdjames094 0
  • Can't make it run with Ydotool on fedora

    Can't make it run with Ydotool on fedora

    Hey everyone, first of all thanks for this amzing tool, I used on my previous distros (Parrot Os) and it was working smoothly, but now I'm on Fedora 37 and I can't make it run with Ydotool since I'm on Wayland. Can you give a more detailled workaround on how to setup. Especially here : You should then place them in a place that's available on your $PATH environment variable.

    opened by ElSamiru 4
Owner
Campbell Barton
Campbell Barton
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022
Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

Habib Abdurrasyid 5 Dec 28, 2021
Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

Ponchotitlán 1 Aug 19, 2021
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Command-line tools for speech and intent recognition on Linux

Michael Hansen 988 Jan 4, 2023
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Snm Logic 1 Dec 20, 2021
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022
Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

Uyghur 11 Nov 17, 2022
Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

LibreTranslate Try it online! | API Docs | Community Forum Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it d

null 3.4k Dec 27, 2022
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

null 37 Dec 4, 2022
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

Rishikesh (ऋषिकेश) 33 Sep 22, 2022
glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

Rhasspy 8 Dec 25, 2022
A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

emovoz Introduction A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container. The SER system was built with

null 2 Nov 11, 2022
Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

epub2audiobook Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech Input examples qual a pasta do seu

null 7 Aug 25, 2022
Saptak Bhoumik 14 May 24, 2022
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.2k Jan 9, 2023