Offline Speech to Text for Desktop Linux.
This is a utility that provides simple access speech to text for using in Linux without being tied to a desktop environment.
- This is a single file Python script with minimal dependencies.
- User configuration lets you manipulate text using Python string operations.
- Zero Overhead
- As this relies on manual activation there are no background processes.
Dictation is accessed manually with begin/end commands.
This uses the excellent vosk-api.
It is suggested to bind begin/end/cancel to shortcut keys.
For details on how this can be used, see:
nerd-dictation --help and
nerd-dictation begin --help.
Specific features include:
- Numbers as Digits
Optional conversion from numbers to digits.
Three million five hundred and sixty secondbecomes
A series of numbers (such as reciting a phone number) is also supported.
Two four six eightbecomes
- Time Out
Optionally end speech to text early when no speech is detected for a given number of seconds. (without an explicit call to
endwhich is otherwise required).
- Output Type
- Output can simulate keystroke events (default) or simply print to the standard output.
- User Configuration Script
- User configuration is just a Python script which can be used to manipulate text using Python's full feature set.
nerd-dictation begin --help for details on how to access these options.
- Python 3.
- The VOSK-API.
pareccommand (for recording from pulse-audio).
xdotoolcommand to simulate keyboard input.
pip3 install vosk git clone https://github.com/ideasman42/nerd-dictation.git cd nerd-dictation wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip unzip vosk-model-small-en-us-0.15.zip mv vosk-model-small-en-us-0.15 model
To test dictation:
./nerd-dictation begin --vosk-model-dir=./model & # Start speaking. ./nerd-dictation end
Reminder that it's up to you to bind begin/end/cancel to actions you can easily access (typically key shortcuts).
To avoid having to pass the
--vosk-model-dirargument, copy the model to the default path:
mkdir -p ~/.config/nerd-dictation mv ./model ~/.config/nerd-dictation
Once this is working properly you may wish to download one of the larger language models for more accurate dictation. They are available here.
This is an example of a trivial configuration file which simply makes the input text uppercase.
# ~/.config/nerd-dictation/nerd-dictation.py def nerd_dictation_process(text): return text.upper()
A more comprehensive configuration is included in the
- The processing function can be used to implement your own actions using keywords of your choice. Simply return a blank string if you have implemented your own text handling.
- Context sensitive actions can be implemented using command line utilities to access the active window.
- Local Configuration
- Language Model
--vosk-model-dir=PATHcan be used to override the default.
- Typing in results will never press enter/return.
- Pulse audio is used for recording.
- Recording and speech to text a performed in parallel.
Store the result of speech to text as a variable in the shell:
SPEECH="$(nerd-dictation begin --timeout=1.0 --output=STDOUT)"
Text from VOSK is all lower-case, while the user configuration can be used to set the case of common words like
Ithis isn't very convenient (see the example configuration for details).
For some users the delay in start up may be noticeable on systems with slower hard disks especially when running for the 1st time (a cold start).
This is a limitation with the choice not to use a service that runs in the background. Recording begins before any the speech-to-text components are loaded to mitigate this problem.
- And a general solution to capitalize words (proper nouns for example).
- Preview output while dictating.
- Wayland support (this should be quite simple to support and mainly relies on a replacement for
- Add a
setup.pyfor easy installation on uses systems.
- Possibly other speech to text engines (only if they provide some significant benefits).
- Possibly support Windows & macOS.