Simple, hackable offline speech to text - using the VOSK-API.

Related tags

Audio nerd-dictation
Overview

Nerd Dictation

Offline Speech to Text for Desktop Linux.

This is a utility that provides simple access speech to text for using in Linux without being tied to a desktop environment.

Simple
This is a single file Python script with minimal dependencies.
Hackable
User configuration lets you manipulate text using Python string operations.
Zero Overhead
As this relies on manual activation there are no background processes.

Dictation is accessed manually with begin/end commands.

This uses the excellent vosk-api.

Usage

It is suggested to bind begin/end/cancel to shortcut keys.

nerd-dictation begin
nerd-dictation end

For details on how this can be used, see: nerd-dictation --help and nerd-dictation begin --help.

Features

Specific features include:

Numbers as Digits

Optional conversion from numbers to digits.

So Three million five hundred and sixty second becomes 3,000,562nd.

A series of numbers (such as reciting a phone number) is also supported.

So Two four six eight becomes 2,468.

Time Out
Optionally end speech to text early when no speech is detected for a given number of seconds. (without an explicit call to end which is otherwise required).
Output Type
Output can simulate keystroke events (default) or simply print to the standard output.
User Configuration Script
User configuration is just a Python script which can be used to manipulate text using Python's full feature set.

See nerd-dictation begin --help for details on how to access these options.

Dependencies

  • Python 3.
  • The VOSK-API.
  • parec command (for recording from pulse-audio).
  • xdotool command to simulate keyboard input.

Install

pip3 install vosk
git clone https://github.com/ideasman42/nerd-dictation.git
cd nerd-dictation
wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 model

To test dictation:

./nerd-dictation begin --vosk-model-dir=./model &
# Start speaking.
./nerd-dictation end
  • Reminder that it's up to you to bind begin/end/cancel to actions you can easily access (typically key shortcuts).

  • To avoid having to pass the --vosk-model-dir argument, copy the model to the default path:

    mkdir -p ~/.config/nerd-dictation
    mv ./model ~/.config/nerd-dictation

Hint

Once this is working properly you may wish to download one of the larger language models for more accurate dictation. They are available here.

Configuration

This is an example of a trivial configuration file which simply makes the input text uppercase.

# ~/.config/nerd-dictation/nerd-dictation.py
def nerd_dictation_process(text):
    return text.upper()

A more comprehensive configuration is included in the examples/ directory.

Hints

  • The processing function can be used to implement your own actions using keywords of your choice. Simply return a blank string if you have implemented your own text handling.
  • Context sensitive actions can be implemented using command line utilities to access the active window.

Paths

Local Configuration
~/.config/nerd-dictation/nerd-dictation.py
Language Model

~/.config/nerd-dictation/model

Note that --vosk-model-dir=PATH can be used to override the default.

Details

  • Typing in results will never press enter/return.
  • Pulse audio is used for recording.
  • Recording and speech to text a performed in parallel.

Examples

Store the result of speech to text as a variable in the shell:

SPEECH="$(nerd-dictation begin --timeout=1.0 --output=STDOUT)"

Limitations

  • Text from VOSK is all lower-case, while the user configuration can be used to set the case of common words like I this isn't very convenient (see the example configuration for details).

  • For some users the delay in start up may be noticeable on systems with slower hard disks especially when running for the 1st time (a cold start).

    This is a limitation with the choice not to use a service that runs in the background. Recording begins before any the speech-to-text components are loaded to mitigate this problem.

Further Work

  • And a general solution to capitalize words (proper nouns for example).
  • Preview output while dictating.
  • Wayland support (this should be quite simple to support and mainly relies on a replacement for xdotool).
  • Add a setup.py for easy installation on uses systems.
  • Possibly other speech to text engines (only if they provide some significant benefits).
  • Possibly support Windows & macOS.
Comments
  • Packaging

    Packaging

    Hello, I have the idea to package nerd-dictation for Pypi.org. I tested adding a setup.py and setup.cfg file. Thus I tried to consider nerd-dictation file as a module, adding a console script entry. At this step, I'm facing a problem that the name nerd-dictation is not allowed because of the dash, the name generates syntax error with `import nerd-dictation". Can it be considered to change the name in nerd_dictation instead of nerd-dictation? I didn't yet explored another way to not use module/console script, but to install directly the nerd-dictation script. What do you think about that? The background idea is to distribute it with easy installation with pip install, and also that elograf can require it as dependency.

    opened by papoteur-mga 8
  • xdotool: freezes the OS

    xdotool: freezes the OS

    When I run the program by assigning the command "nerd-dictation begin --timeout 1 --numbers-as-digits --numbers-use-separator" to a custom Keyboard Shortcut on Ubuntu 20.04 it seems to be freezing every single time. Any fixes for this? It seems to behave like a memory leak, It completely crashes the OS.

    opened by 52617365 8
  • Add shell.nix and package vosk

    Add shell.nix and package vosk

    Greetings, started playing around with this the other day.. I run NixOS so I had to lay some groundwork first.. Thought others might appreciate it too.

    It just drops you into a nix-shell with the required packages so you can run nerd-dictation. It packages a couple of the English models. Easy enough to copy for other language models though. :)

    opened by mankyKitty 6
  • Is using nerd-dictation to control software a solved problem?

    Is using nerd-dictation to control software a solved problem?

    I want to use nerd-dictation for processing my photos, basically:

    • show photo
    • wait for command (next previous delete promote)
    • if command is detected: show what was detected (or produce sound feedback?), execute action

    I am not entirely sure what would be the best way to implement this - has anyone did something like that already? Seems a relatively obvious use of actually working voice-to-text.

    (maybe using nerd-dictation is a mistake and I should be using vosk API directly?)

    question 
    opened by matkoniecz 6
  • No keystrokes appear in LibreOffice Writer

    No keystrokes appear in LibreOffice Writer

    With some sort of recent upgrade either Ubuntu or LibreOffice the I have noticed that I cannot use nerd-dictation in LibreOffice writer. No text appears. nerd-dictation works fine with chrome or thunderbird windows. It did not used to be this way. I have upgraded from ubuntu 18 to 21.10 recently, so perhaps there was some sort of change with regard to that period maybe there's some sort of security policy that prevents simulated keystrokes? Just a guess. Libreoffice is 7.2.3.2.

    opened by xenotropic 5
  • Russian input lags entire interface

    Russian input lags entire interface

    Russian input lags entire interface. But some programs (Blender for example) don't lag at all (also Blender usually launched in fullscreen). English input works fine. Model: "vosk-model-small-ru-0.22"

    opened by scaledteam 5
  • What is the correct format for --pulse-device-name?

    What is the correct format for --pulse-device-name?

    First off - thank you. This is precisely what I have been looking for. Great work here!

    I want to ensure that the program is using the right microphone - I want to make sure it uses the external one, not the one on my laptop. Running pactl list gives me a WHOLE slew of stuff, but I think this is the chunk I'm most interested in, since it lists my external microphone:

    Card #2
    	Name: alsa_card.usb-BLUE_MICROPHONE_Blue_Snowball_201603-00
    	Driver: module-alsa-card.c
    	Owner Module: 28
    	Properties:
    		alsa.card = "1"
    		alsa.card_name = "Blue Snowball"
    		alsa.long_card_name = "BLUE MICROPHONE Blue Snowball at usb-0000:00:14.0-3, full speed"
    		alsa.driver_name = "snd_usb_audio"
    		device.bus_path = "pci-0000:00:14.0-usb-0:3:1.0"
    		sysfs.path = "/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3:1.0/sound/card1"
    		udev.id = "usb-BLUE_MICROPHONE_Blue_Snowball_201603-00"
    		device.bus = "usb"
    		device.vendor.id = "0d8c"
    		device.vendor.name = "C-Media Electronics, Inc."
    		device.product.id = "0005"
    		device.product.name = "Blue Snowball"
    		device.serial = "BLUE_MICROPHONE_Blue_Snowball_201603"
    		device.string = "1"
    		device.description = "Blue Snowball"
    		module-udev-detect.discovered = "1"
    		device.icon_name = "audio-card-usb"
    	Profiles:
    		input:mono-fallback: Mono Input (sinks: 0, sources: 1, priority: 1, available: yes)
    		input:multichannel-input: Multichannel Input (sinks: 0, sources: 1, priority: 1, available: yes)
    		off: Off (sinks: 0, sources: 0, priority: 0, available: yes)
    	Active Profile: input:mono-fallback
    	Ports:
    		analog-input-mic: Microphone (priority: 8700, latency offset: 0 usec)
    			Properties:
    				device.icon_name = "audio-input-microphone"
    			Part of profile(s): input:mono-fallback
    		multichannel-input: Multichannel Input (priority: 0, latency offset: 0 usec)
    			Part of profile(s): input:multichannel-input
    

    I have tried feeding the "Name" value (alsa_card.usb-BLUE_MICROPHONE_Blue_Snowball_201603-00), the udev.id, and the device.icon_name (longshot) into the CLI, each time getting the error Stream error: No such entity. If I don't include the --pulse-device-name, dictation works fine, but I want to ensure it's getting the best input possible.

    Which of the values from the pactl list output should we use for that flag? Or is there another value further up in the stream - i.e. not "Card #2' - that I should be looking at?

    Thanks!

    opened by vrrobz 5
  • pa_context_connect() failed: Connection refused

    pa_context_connect() failed: Connection refused

    Hi, I'm trying to run nerd-dictator on Kubutu 20.04. I created a virtualenv, activated it and installed vosk by pip3.

    I'm running newrd-dictation as root user and I get

    ./nerd-dictation begin --vosk-model-dir=./model &
    pa_context_connect() failed: Connection refused
    

    (the process still runs on background). What is causing this error? Am I missing something?

    If I try ti run it as normal user, I get some permission error:

      File "./nerd-dictation", line 1188, in <module>
        main()
      File "./nerd-dictation", line 1184, in main
        args.func(args)
      File "./nerd-dictation", line 1107, in <lambda>
        func=lambda args: main_begin(
      File "./nerd-dictation", line 747, in main_begin
        touch(path_to_cookie)
      File "./nerd-dictation", line 65, in touch
        os.utime(filepath, None)
    PermissionError: [Errno 13] Permission denied
    

    I tried to change the ownership of the main folder and model/ folder so they belong to my current user, but I still get the error. I notice the error mention a "path_to_cookie" but I have no idea of what path it could be.

    opened by sirio81 5
  • Lots of numbers being spit out

    Lots of numbers being spit out

    Thank you for writing this interesting project. It's running, but it's spitting out a lot of garbage along with the text.

    ❯ ./nerd-dictation begin 0.09997663497924805 0.09870014190673829 0.09955344200134278 0.09974346160888672 0.09971175193786622 0.0929502010345459 0.09946784973144532 0.09947595596313477 0.0925527572631836 0.09944138526916504 0.09245476722717286 0.09949836730957032 0.09236202239990235 0.09945592880249024 0.09939346313476563 0.0923090934753418 0.09901008605957032 THIS0.09907612800598145 IS0.039521551132202154 0.09932670593261719 0.09929046630859376 ANOTHER0.07741460800170899 0.09929213523864747 0.09936389923095704 TERRORIST0.015120840072631841 0.09926352500915528 ST0.09925565719604493 0.0896986484527588 0.09934697151184083 0.09947404861450196 0.09257588386535645 0.09938035011291504 0.09136066436767579 0.09934458732604981 0.06850967407226563 0.09943637847900391 0.09936747550964356 0.09154710769653321 0.09944114685058594 0.09195122718811036 0.09947142601013184

    How do I suppress all these logits?

    opened by MikeyBeez 5
  • 'huh' outputted after exiting

    'huh' outputted after exiting

    Hello, thanks for creating this project! Very cool.

    I've noticed that huh is outputted after I stop nerd-dictation from running. Maybe outputted twice? I'm using the small English model from the install instructions.

    image

    opened by makeworld-the-better-one 4
  • English text is out of order and includes extra characters

    English text is out of order and includes extra characters

    The characters are strangely out of order. Using the vosk-model-en-us-0.22-lgraph.zip model. Saying "This is a test of the emergency broadcast system" multiple times:

    $ ./nerd-dictation begin --vosk-model-dir=./model --timeout=1.0
    this i tstfesa  o the mycnegeer broactdas ysstem
    $ ./nerd-dictation begin --vosk-model-dir=./model --timeout=1.0
    tihs is  atesoft  theem ergencbortsy adca systme
    $ ./nerd-dictation begin --vosk-model-dir=./model --timeout=1.0
    this is a se oftt the mereg aorbnecscdyta system
    

    The Vosk API test_microphone.py works correctly:

    $ python3 test_microphone.py
    LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=6
    LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:11:12:13:14:15
    LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
    LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
    LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.089 seconds in looped compilation.
    LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from model/ivector/final.ie
    LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
    LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
    LOG (VoskAPI:ReadDataFiles():model.cc:281) Loading HCL and G from model/graph/HCLr.fst model/graph/Gr.fst
    LOG (VoskAPI:ReadDataFiles():model.cc:302) Loading winfo model/graph/phones/word_boundary.int
    ################################################################################
    Press Ctrl+C to stop the recording
    ################################################################################
    {
      "partial" : ""
    }
    <SNIP DUPLICATES>
    {
      "partial" : "this"
    }
    {
      "partial" : "this"
    }
    {
      "partial" : "this is"
    }
    {
      "partial" : "this is a"
    }
    {
      "partial" : "this is a test of"
    }
    {
      "partial" : "this is a test of"
    }
    {
      "partial" : "this is a test of the"
    }
    {
      "partial" : "this is a test of the emergency"
    }
    {
      "partial" : "this is a test of the emergency broadcast"
    }
    <SNIP DUPLICATES>
    {
      "partial" : "this is a test of the emergency broadcast system"
    }
    <SNIP DUPLICATES>
    {
      "text" : "this is a test of the emergency broadcast system"
    }
    {
      "partial" : ""
    }
    <SNIP DUPLICATES>
    ^C
    Done
    
    opened by 13rac1 4
  • What configuration script and nerd-dictation options did you use for your youtube video?

    What configuration script and nerd-dictation options did you use for your youtube video?

    I saw your video and wondered how you invoked nerd-dictation for that example? Did you have any special .py configuration or command line options?

    • https://www.youtube.com/watch?v=T7sR-4DFhpQ
    opened by KJ7LNW 0
  • PYNPUT support

    PYNPUT support

    hello i am so glad that you wrote the software (i wrote this message using nerd dictation come a however i do not have punctuation fixed yet exclamation mark)

    While i did not write the patch for PYNPUT below, i thought you might be interested emerging it since it is an additional input method that does not depend on external tooling:

    • https://github.com/ideasman42/nerd-dictation/compare/master...mklcp:nerd-dictation:master
    opened by KJ7LNW 0
  • Sentence capitalization and punctuation not working as in demo

    Sentence capitalization and punctuation not working as in demo

    First of all, thanks for creating this! I am very excited to see such an accurate, accessible and extendable voice typing solution on Linux!

    The youtube demo is very exciting and I would love to voice type with that accuracy. However, when I try to set it up myself, there seems to be no punctuation added. This is the command I am running:

    ./nerd-dictation begin --full-sentence --punctuate-from-previous-timeout 2 &
    

    The first sentence is capitalized but no matter how long I wait there is no punctuation added. I have tried with different models, and all produce the same result. I initially thought that there was some background noise keeping the mic alive, but the --timeout option works as expected so that can't be it. Adding --continuous makes each new sentence capitalize, but punctuation is still missing.

    After looking around a bit it seems like there is a bug that make the variable is_run_on never evaluate to true since age_in_seconds is always a really high number. However, even after manually setting is_run_on to True, I get only commas inserted unless I use the --full-sentence option in which case only periods are inserted. How can I include a mix of the two as in the demo video? Instructions of how to replicate the behavior in the video would be very helpful.

    Related issues:

    • https://github.com/ideasman42/nerd-dictation/issues/63
    • https://github.com/ideasman42/nerd-dictation/issues/59
    • https://github.com/ideasman42/nerd-dictation/pull/50 (but in the video there is no need to say "comma" etc)
    opened by joelostblom 0
  • Support for OpenAI Whisper

    Support for OpenAI Whisper

    I'm wondering if its on the roadmap to add support for OpenAI's whisper. It could possibly be done using a packaged docker container something like this: https://github.com/ahmetoner/whisper-asr-webservice/

    I assume at this point the results would be superior. Maybe failing that there could be some notes about how to use another backend for the voice to text?

    Cheers and great work - thank you

    opened by nkeilar 2
  • On/off script + tray icon

    On/off script + tray icon

    The included bash script (see below) can be linked to a desktop shortcut.

    1. it starts nerd-dictation in background
    2. and runs a py script which places a tray-icon on xfce panel. This allows to change the langage and stop nerd-dictation daemon

    ADAPT THE 2 PROGRAMS :

    1. nerd-command.sh :
      1.1) line 3 : The "cd " command in the bash script should be adapted according to user's folder organisation 1.2) line 12: The variable LANG set the langage loaded by the daemon

    2. nerd-tray.py : 2.1) Place model folders in user's .config directory. Model folders should be renamed "model_xx" (eg model_US or model_FR). 2.2) If you dont want to modify your organisation, adapt the line where the call to your model is made : "os.system("nerd-dictation begin --vosk-model-dir=$HOME/.config/nerd-dictation/model_" + current_label + " &")"

    NOTES :

    Rought but works for me. I use the bash script as an on/off button. XFCE allows to link a keyboard shortcut to this script making its use very confortable. Hope this helps. Best regards.


    nerd-command.sh

    #!/bin/bash
    
    LANG=US
    
    if [[ ! "$(ps -o ppid= -C parec)" == "" ]] 
    then 
      nerd-dictation end
      kill -9 $(ps aux|grep 'python nerd-tray.py'|grep -v grep |awk '{print $2}')
      notify-send "nerd-ended" 
    
    else 
      cd $HOME/Documents/dotfiles/backup/Scripts
      python nerd-tray.py $LANG &
      notify-send "nerd-started" 
    fi
    

    nerd-tray.py

    #!/usr/bin/env python3
    
    import os
    import gi
    import sys
    
    gi.require_version("Gtk", "3.0")
    gi.require_version("AppIndicator3", "0.1")
    gi.require_version('Notify', '0.7')
    from gi.repository import Gtk as gtk
    from gi.repository import AppIndicator3 as appindicator
    # from gi.repository import Notify as notify
    
    LAUNCHERS = [
        {
            "label": "US",
            "icon": "/usr/share/xfce4/xkb/flags/us.svg",
            "command": "setLang",
        },
        {
            "label": "FR",
            "icon": "/usr/share/xfce4/xkb/flags/fr.svg",
            "command": "setLang",
        },
        {
            "sep": True,
        },
        {
            "label": "Stop",
            "icon": None,
            "command": "stopLang",
        },
        {
            "label": "Exit",
            "icon": None,
            "command": "quit",
        },
    ]
    
    APPINDICATOR_ID = 'nerd-tray'
    
    class IconoTray:
        def __init__(self, appid, iconname):
            self.menu = gtk.Menu()
            self.ind = appindicator.Indicator.new(appid, iconname, appindicator.IndicatorCategory.APPLICATION_STATUS)
            self.ind.set_status (appindicator.IndicatorStatus.ACTIVE)
            self.ind.set_menu(self.menu)
            # notify.init(APPINDICATOR_ID)
            # notify.Notification.new(appid, "started", None).show()
    
        def add_menu_item(self, label=None, icon=None, command=None, sep=False):
            if sep :
                item = gtk.SeparatorMenuItem()
            elif icon == None :
                item = gtk.MenuItem()
                item.set_label(label)
            else :
                img = gtk.Image()
                img.set_from_file(icon)
                item = gtk.ImageMenuItem(label=label)
                item.set_image(img)
            if command != None :
                item.connect("activate", getattr(self, command))
            self.menu.append(item)
            self.menu.show_all()
    
        def setLang(self, source):
            current_label = source.get_label()
            os.system("nerd-dictation end")
            os.system("nerd-dictation begin --vosk-model-dir=$HOME/.config/nerd-dictation/model_" + current_label + " &")
            for item in self.menu:
                item.set_sensitive(item.get_label() != current_label)
            self.ind.set_icon('audio-recorder-on')
            return
    
        def stopLang(self, source):
            os.system("nerd-dictation end")
            for item in self.menu:
                item.set_sensitive(True)
            self.ind.set_icon('notification-microphone-sensitivity-high')
            return
    
        def selectItem(self, label):
            for item in self.menu:
                if item.get_label() == label:
                    self.setLang(item)
            return
    
        def quit(self, source):
            os.system("nerd-dictation end")
            # notify.Notification.new("nerd-dictation", "stopped.", None).show()
            # notify.uninit()
            gtk.main_quit()
    
    def main():
        app = IconoTray(APPINDICATOR_ID, "notification-microphone-sensitivity-high")
        for launcher in LAUNCHERS:
            app.add_menu_item(**launcher)
        if len(sys.argv) >= 2 :
            app.selectItem(sys.argv[1])
        gtk.main()
    
    if __name__ == "__main__":
        main()
    
    opened by mdjames094 0
  • Can't make it run with Ydotool on fedora

    Can't make it run with Ydotool on fedora

    Hey everyone, first of all thanks for this amzing tool, I used on my previous distros (Parrot Os) and it was working smoothly, but now I'm on Fedora 37 and I can't make it run with Ydotool since I'm on Wayland. Can you give a more detailled workaround on how to setup. Especially here : You should then place them in a place that's available on your $PATH environment variable.

    opened by ElSamiru 3
Owner
Campbell Barton
Campbell Barton
Speech recognition module for Python, supporting several engines and APIs, online and offline.

SpeechRecognition Library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/

Anthony Zhang 6.7k Jan 8, 2023
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.1k Dec 31, 2022
This library provides common speech features for ASR including MFCCs and filterbank energies.

python_speech_features This library provides common speech features for ASR including MFCCs and filterbank energies. If you are not sure what MFCCs ar

James Lyons 2.2k Jan 4, 2023
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

SpeechPy Official Project Documentation Table of Contents Documentation Which Python versions are supported Citation How to Install? Local Installatio

Amirsina Torfi 870 Dec 27, 2022
Conferencing Speech Challenge

ConferencingSpeech 2021 challenge This repository contains the datasets list and scripts required for the ConferencingSpeech challenge. For more detai

null 73 Nov 29, 2022
Speech Algorithms Collections

Speech Algorithms Collections

Ryuk 498 Jan 6, 2023
Voicefixer aims at the restoration of human speech regardless how serious its degraded.

Voicefixer aims at the restoration of human speech regardless how serious its degraded.

Leo 324 Dec 26, 2022
Some utils for auto speech recognition

About Some utils for auto speech recognition. Utils Util Description Script Reset audio Reset sample rate, sample width, etc of audios.

null 1 Jan 24, 2022
Voice to Text using Raspberry Pi

This module will help to convert your voice (speech) into text using Speech Recognition Library. You can control the devices or you can perform the desired tasks by the word recognition

Raspberry_Pi Pakistan 2 Dec 15, 2021
A voice based calculator by using termux api in Android

termux_voice_calculator This is. A voice based calculator by using termux api in Android Instagram account ?? ?? Requirements and installation Downloa

ʕ´•ᴥ•`ʔ╠ŞĦỮβĦa̷m̷╣ʕ´•ᴥ•`ʔ 2 Apr 29, 2022
Terminal-based audio-to-text converter

att Terminal-based audio-to-text converter Project description A terminal-based audio-to-text converter written in python, enabling you to convert .wa

Sven Eschlbeck 4 Dec 15, 2022
Delta TTA(Text To Audio) SoftWare

Text-To-Audio-Windows Delta TTA(Text To Audio) SoftWare Info You Can Use It For Convert Your Text To Audio File You Just Write Your Text And Your End

Delta Inc. 2 Dec 14, 2021
Scalable audio processing framework written in Python with a RESTful API

TimeSide : scalable audio processing framework and server written in Python TimeSide is a python framework enabling low and high level audio analysis,

Parisson 340 Jan 4, 2023
A Python wrapper around the Soundcloud API

soundcloud-python A friendly wrapper around the Soundcloud API. Installation To install soundcloud-python, simply: pip install soundcloud Or if you'r

SoundCloud 84 Dec 31, 2022
Supysonic is a Python implementation of the Subsonic server API.

Supysonic Supysonic is a Python implementation of the Subsonic server API. Current supported features are: browsing (by folders or tags) streaming of

Alban 228 Nov 19, 2022
Just-Music - Spotify API Driven Music Web app, that allows to listen and control and share songs

Just Music... Just Music Is A Web APP That Allows Users To Play Song Using Spoti

Ayush Mishra 3 May 1, 2022
Manipulate audio with a simple and easy high level interface

Pydub Pydub lets you do stuff to audio in a way that isn't stupid. Stuff you might be looking for: Installing Pydub API Documentation Dependencies Pla

James Robert 6.6k Jan 1, 2023
Powerful, simple, audio tag editor for GNU/Linux

puddletag puddletag is an audio tag editor (primarily created) for GNU/Linux similar to the Windows program, Mp3tag. Unlike most taggers for GNU/Linux

null 341 Dec 26, 2022
LibXtract is a simple, portable, lightweight library of audio feature extraction functions.

LibXtract LibXtract is a simple, portable, lightweight library of audio feature extraction functions. The purpose of the library is to provide a relat

Jamie Bullock 215 Nov 16, 2022