Audio fingerprinting and recognition in Python

Related tags

Audio dejavu
Overview

dejavu

Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here:
How it works

Dejavu can memorize audio by listening to it once and fingerprinting it. Then by playing a song and recording microphone input or reading from disk, Dejavu attempts to match the audio against the fingerprints held in the database, returning the song being played.

Note: for voice recognition, Dejavu is not the right tool! Dejavu excels at recognition of exact signals with reasonable amounts of noise.

Quickstart with Docker

First, install Docker.

# build and then run our containers
$ docker-compose build
$ docker-compose up -d

# get a shell inside the container
$ docker-compose run python /bin/bash
Starting dejavu_db_1 ... done
root@f9ea95ce5cea:/code# python example_docker_postgres.py 
Fingerprinting channel 1/2 for test/woodward_43s.wav
Fingerprinting channel 1/2 for test/sean_secs.wav
...

# connect to the database and poke around
root@f9ea95ce5cea:/code# psql -h db -U postgres dejavu
Password for user postgres:  # type "password", as specified in the docker-compose.yml !
psql (11.7 (Debian 11.7-0+deb10u1), server 10.7)
Type "help" for help.

dejavu=# \dt
            List of relations
 Schema |     Name     | Type  |  Owner   
--------+--------------+-------+----------
 public | fingerprints | table | postgres
 public | songs        | table | postgres
(2 rows)

dejavu=# select * from fingerprints limit 5;
          hash          | song_id | offset |        date_created        |       date_modified        
------------------------+---------+--------+----------------------------+----------------------------
 \x71ffcb900d06fe642a18 |       1 |    137 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153
 \xf731d792977330e6cc9f |       1 |    148 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153
 \x71ff24aaeeb55d7b60c4 |       1 |    146 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153
 \x29349c79b317d45a45a8 |       1 |    101 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153
 \x5a052144e67d2248ccf4 |       1 |    123 | 2020-06-03 05:14:19.400153 | 2020-06-03 05:14:19.400153
(10 rows)

# then to shut it all down...
$ docker-compose down

If you want to be able to use the microphone with the Docker container, you'll need to do a little extra work. I haven't had the time to write this up, but if anyone wants to make a PR, I'll happily merge.

Docker alternative on local machine

Follow instructions in INSTALLATION.md

Next, you'll need to create a MySQL database where Dejavu can store fingerprints. For example, on your local setup:

$ mysql -u root -p
Enter password: **********
mysql> CREATE DATABASE IF NOT EXISTS dejavu;

Now you're ready to start fingerprinting your audio collection!

You may also use Postgres, of course. The same method applies.

Fingerprinting

Let's say we want to fingerprint all of July 2013's VA US Top 40 hits.

Start by creating a Dejavu object with your configurations settings (Dejavu takes an ordinary Python dictionary for the settings).

>>> from dejavu import Dejavu
>>> config = {
...     "database": {
...         "host": "127.0.0.1",
...         "user": "root",
...         "password": <password above>, 
...         "database": <name of the database you created above>,
...     }
... }
>>> djv = Dejavu(config)

Next, give the fingerprint_directory method three arguments:

  • input directory to look for audio files
  • audio extensions to look for in the input directory
  • number of processes (optional)
>>> djv.fingerprint_directory("va_us_top_40/mp3", [".mp3"], 3)

For a large amount of files, this will take a while. However, Dejavu is robust enough you can kill and restart without affecting progress: Dejavu remembers which songs it fingerprinted and converted and which it didn't, and so won't repeat itself.

You'll have a lot of fingerprints once it completes a large folder of mp3s:

>>> print djv.db.get_num_fingerprints()
5442376

Also, any subsequent calls to fingerprint_file or fingerprint_directory will fingerprint and add those songs to the database as well. It's meant to simulate a system where as new songs are released, they are fingerprinted and added to the database seemlessly without stopping the system.

Configuration options

The configuration object to the Dejavu constructor must be a dictionary.

The following keys are mandatory:

  • database, with a value as a dictionary with keys that the database you are using will accept. For example with MySQL, the keys must can be anything that the MySQLdb.connect() function will accept.

The following keys are optional:

  • fingerprint_limit: allows you to control how many seconds of each audio file to fingerprint. Leaving out this key, or alternatively using -1 and None will cause Dejavu to fingerprint the entire audio file. Default value is None.
  • database_type: mysql (the default value) and postgres are supported. If you'd like to add another subclass for BaseDatabase and implement a new type of database, please fork and send a pull request!

An example configuration is as follows:

>>> from dejavu import Dejavu
>>> config = {
...     "database": {
...         "host": "127.0.0.1",
...         "user": "root",
...         "password": "Password123", 
...         "database": "dejavu_db",
...     },
...     "database_type" : "mysql",
...     "fingerprint_limit" : 10
... }
>>> djv = Dejavu(config)

Tuning

Inside config/settings.py, you may want to adjust following parameters (some values are given below).

FINGERPRINT_REDUCTION = 30
PEAK_SORT = False
DEFAULT_OVERLAP_RATIO = 0.4
DEFAULT_FAN_VALUE = 5
DEFAULT_AMP_MIN = 10
PEAK_NEIGHBORHOOD_SIZE = 10

These parameters are described within the file in detail. Read that in-order to understand the impact of changing these values.

Recognizing

There are two ways to recognize audio using Dejavu. You can recognize by reading and processing files on disk, or through your computer's microphone.

Recognizing: On Disk

Through the terminal:

$ python dejavu.py --recognize file sometrack.wav 
{'total_time': 2.863781690597534, 'fingerprint_time': 2.4306554794311523, 'query_time': 0.4067542552947998, 'align_time': 0.007731199264526367, 'results': [{'song_id': 1, 'song_name': 'Taylor Swift - Shake It Off', 'input_total_hashes': 76168, 'fingerprinted_hashes_in_db': 4919, 'hashes_matched_in_input': 794, 'input_confidence': 0.01, 'fingerprinted_confidence': 0.16, 'offset': -924, 'offset_seconds': -30.00018, 'file_sha1': b'3DC269DF7B8DB9B30D2604DA80783155912593E8'}, {...}, ...]}

or in scripting, assuming you've already instantiated a Dejavu object:

>>> from dejavu.logic.recognizer.file_recognizer import FileRecognizer
>>> song = djv.recognize(FileRecognizer, "va_us_top_40/wav/Mirrors - Justin Timberlake.wav")

Recognizing: Through a Microphone

With scripting:

>>> from dejavu.logic.recognizer.microphone_recognizer import MicrophoneRecognizer
>>> song = djv.recognize(MicrophoneRecognizer, seconds=10) # Defaults to 10 seconds.

and with the command line script, you specify the number of seconds to listen:

$ python dejavu.py --recognize mic 10

Testing

Testing out different parameterizations of the fingerprinting algorithm is often useful as the corpus becomes larger and larger, and inevitable tradeoffs between speed and accuracy come into play.

Confidence

Test your Dejavu settings on a corpus of audio files on a number of different metrics:

  • Confidence of match (number fingerprints aligned)
  • Offset matching accuracy
  • Song matching accuracy
  • Time to match

Accuracy

An example script is given in test_dejavu.sh, shown below:

#####################################
### Dejavu example testing script ###
#####################################

###########
# Clear out previous results
rm -rf ./results ./temp_audio

###########
# Fingerprint files of extension mp3 in the ./mp3 folder
python dejavu.py --fingerprint ./mp3/ mp3

##########
# Run a test suite on the ./mp3 folder by extracting 1, 2, 3, 4, and 5 
# second clips sampled randomly from within each song 8 seconds 
# away from start or end, sampling offset with random seed = 42, and finally, 
# store results in ./results and log to ./results/dejavu-test.log
python run_tests.py \
    --secs 5 \
    --temp ./temp_audio \
    --log-file ./results/dejavu-test.log \
    --padding 8 \
    --seed 42 \
    --results ./results \
    ./mp3

The testing scripts are as of now are a bit rough, and could certainly use some love and attention if you're interested in submitting a PR! For example, underscores in audio filenames currently breaks the test scripts.

How does it work?

The algorithm works off a fingerprint based system, much like:

The "fingerprints" are locality sensitive hashes that are computed from the spectrogram of the audio. This is done by taking the FFT of the signal over overlapping windows of the song and identifying peaks. A very robust peak finding algorithm is needed, otherwise you'll have a terrible signal to noise ratio.

Here I've taken the spectrogram over the first few seconds of "Blurred Lines". The spectrogram is a 2D plot and shows amplitude as a function of time (a particular window, actually) and frequency, binned logrithmically, just as the human ear percieves it. In the plot below you can see where local maxima occur in the amplitude space:

Spectrogram

Finding these local maxima is a combination of a high pass filter (a threshold in amplitude space) and some image processing techniques to find maxima. A concept of a "neighboorhood" is needed - a local maxima with only its directly adjacent pixels is a poor peak - one that will not survive the noise of coming through speakers and through a microphone.

If we zoom in even closer, we can begin to imagine how to bin and discretize these peaks. Finding the peaks itself is the most computationally intensive part, but it's not the end. Peaks are combined using their discrete time and frequency bins to create a unique hash for that particular moment in the song - creating a fingerprint.

Spectgram zoomed

For a more detailed look at the making of Dejavu, see my blog post here.

How well it works

To truly get the benefit of an audio fingerprinting system, it can't take a long time to fingerprint. It's a bad user experience, and furthermore, a user may only decide to try to match the song with only a few precious seconds of audio left before the radio station goes to a commercial break.

To test Dejavu's speed and accuracy, I fingerprinted a list of 45 songs from the US VA Top 40 from July 2013 (I know, their counting is off somewhere). I tested in three ways:

  1. Reading from disk the raw mp3 -> wav data, and
  2. Playing the song over the speakers with Dejavu listening on the laptop microphone.
  3. Compressed streamed music played on my iPhone

Below are the results.

1. Reading from Disk

Reading from disk was an overwhelming 100% recall - no mistakes were made over the 45 songs I fingerprinted. Since Dejavu gets all of the samples from the song (without noise), it would be nasty surprise if reading the same file from disk didn't work every time!

2. Audio over laptop microphone

Here I wrote a script to randomly chose n seconds of audio from the original mp3 file to play and have Dejavu listen over the microphone. To be fair I only allowed segments of audio that were more than 10 seconds from the starting/ending of the track to avoid listening to silence.

Additionally my friend was even talking and I was humming along a bit during the whole process, just to throw in some noise.

Here are the results for different values of listening time (n):

Matching time

This is pretty rad. For the percentages:

Number of Seconds Number Correct Percentage Accuracy
1 27 / 45 60.0%
2 43 / 45 95.6%
3 44 / 45 97.8%
4 44 / 45 97.8%
5 45 / 45 100.0%
6 45 / 45 100.0%

Even with only a single second, randomly chosen from anywhere in the song, Dejavu is getting 60%! One extra second to 2 seconds get us to around 96%, while getting perfect only took 5 seconds or more. Honestly when I was testing this myself, I found Dejavu beat me - listening to only 1-2 seconds of a song out of context to identify is pretty hard. I had even been listening to these same songs for two days straight while debugging...

In conclusion, Dejavu works amazingly well, even with next to nothing to work with.

3. Compressed streamed music played on my iPhone

Just to try it out, I tried playing music from my Spotify account (160 kbit/s compressed) through my iPhone's speakers with Dejavu again listening on my MacBook mic. I saw no degredation in performance; 1-2 seconds was enough to recognize any of the songs.

Performance

Speed

On my MacBook Pro, matching was done at 3x listening speed with a small constant overhead. To test, I tried different recording times and plotted the recording time plus the time to match. Since the speed is mostly invariant of the particular song and more dependent on the length of the spectrogram created, I tested on a single song, "Get Lucky" by Daft Punk:

Matching time

As you can see, the relationship is quite linear. The line you see is a least-squares linear regression fit to the data, with the corresponding line equation:

1.364757 * record_time - 0.034373 = time_to_match

Notice of course since the matching itself is single threaded, the matching time includes the recording time. This makes sense with the 3x speed in purely matching, as:

1 (recording) + 1/3 (matching) = 4/3 ~= 1.364757

if we disregard the miniscule constant term.

The overhead of peak finding is the bottleneck - I experimented with multithreading and realtime matching, and alas, it wasn't meant to be in Python. An equivalent Java or C/C++ implementation would most likely have little trouble keeping up, applying FFT and peakfinding in realtime.

An important caveat is of course, the round trip time (RTT) for making matches. Since my MySQL instance was local, I didn't have to deal with the latency penalty of transfering fingerprint matches over the air. This would add RTT to the constant term in the overall calculation, but would not effect the matching process.

Storage

For the 45 songs I fingerprinted, the database used 377 MB of space for 5.4 million fingerprints. In comparison, the disk usage is given below:

Audio Information Type Storage in MB
mp3 339
wav 1885
fingerprints 377

There's a pretty direct trade-off between the necessary record time and the amount of storage needed. Adjusting the amplitude threshold for peaks and the fan value for fingerprinting will add more fingerprints and bolster the accuracy at the expense of more space.

Comments
  • Dejavu on python 3.6.6

    Dejavu on python 3.6.6

    Since support for python 2.7 is arriving to an end I've decided to migrate the code to python 3.6.6. In the process I've refactored a little bit the solution to make it simpler (at least the sql part). I've also refactored some code to improve it by using numpy in a better way and for moments removing unnecessary steps while working with lists.

    I've also updated all of the libraries being used, this is, using the latest at this moment.

    The solution is working now with mysql 8 by default.

    UPDATED:

    Now I've added support for Postgresql as well.

    opened by mauricio-repetto 52
  • Speed improvements

    Speed improvements

    Are there any possible tweaks that could be done to the algorithm to improve performance.

    dejavu currently seems to net about x4 real-time fingerprinting speeds.

    I've been testing with lower sample rate, different window size, overlap ratio and I think there are certainly speedups to be found if we can find the sweet spot between speed and accuracy (None of my tests have acceptable accuracy as of yet, and varying speedups).

    Since I'm not too familiar with the exact algorithm I thought it might be a better idea to involve everyone in it.

    opened by Wessie 23
  • New SQLAlchemy DB backend, and fixes

    New SQLAlchemy DB backend, and fixes

    Implemented optional SQLAlchemy backend. Added Unittests for ORM backend, and previous SQL backend. Added pip requirements.txt Added examples for the new backend. Fixed #20.

    opened by pguridi 19
  • Split big files

    Split big files

    added support for huge files check the long_test.py and play with minutes & processes number to fit in the RAM available

    Edit: Procedure of the process:

    1. Splits the large audio file to fingerprintable smaller pieces
    2. Fingerprints them one by one.
    3. Saves them as a single audio in the DataBase.
    opened by thesunlover 18
  • issues with SQL backend and Unique Constraint

    issues with SQL backend and Unique Constraint

    I'm having troubles with the Unique constraint..,or maybe I'm getting something wrong.. :S. here goes.. (talking about the plain SQL backend, not the ORM): the schema dump looks like:

    CREATE TABLE IF NOT EXISTS `fingerprints` (
      `hash` binary(10) NOT NULL,
      `song_id` mediumint(8) unsigned NOT NULL,
      `offset` int(10) unsigned NOT NULL,
      PRIMARY KEY (`hash`),
      UNIQUE KEY `song_id` (`song_id`,`offset`,`hash`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    

    "hash" is a primary key, which means will be unique for each record regardless of the other columns. What happens is that I fingerprint 2 different files, and I get some hashes that are already in the DB but from the other file (with different offset). And because hash is unique, an IntegrityError is raised. Then, the hash of the new file is never saved.

    Am I missing something?.

    opened by pguridi 18
  • Update to Python 3

    Update to Python 3

    Hi,

    This is a great package! I've been working with for a while now. However, since I do most of my work in Python 3 it's been a little frustrating that dejavu only works with Python 2.

    I finally got around to making the necessary updates.

    I've tested the code under Python 3 and everything seems to work well.

    Please accept my changes and merge into the master repository.

    Best regards, Andrew.

    opened by datawookie 16
  • Multiple fragments from one mp3 file

    Multiple fragments from one mp3 file

    Hi,

    I've been playing around with deja vu and able to recognize one fingerprinted fragment but what i would like is to recognize all the known audio fragments from one 10 min audio file. I've been reading trough the issues but not able to find similar questions or examples. Can anyone please help me out with this? I must also say that i'm a newbie to python :)

    Cheers, Henk

    opened by henkit 13
  • Keep getting results when tested with songs those aren't stored in database.

    Keep getting results when tested with songs those aren't stored in database.

    I tried to search for songs those aren't stored in database, but I keep getting results from those (not an error message) My Tuning : FINGERPRINT_REDUCTION = 20 PEAK_SORT = True DEFAULT_OVERLAP_RATIO = 0.5 DEFAULT_FAN_VALUE = 15 DEFAULT_AMP_MIN = 10 PEAK_NEIGHBORHOOD_SIZE = 20 Where did I go wrong? And I also couldn't find any songs when I changed the peak sort to false.

    opened by fainneth 11
  • Add Python 3 compatibility

    Add Python 3 compatibility

    A bit slap-dash, but nothing major modified. Tweaks generally break down into 4 categories:

    1. print()
    2. Builtin/library generator wrangling (xrange, izip, forcing generator evaluation...)
    3. Explicit relative imports (along with cleaning up imports I generally sorted them into stdlib vs. other)
    4. Removing spaces at end of lines (because my editor does it automatically)

    Another minor tweak was explicitly encoding a string (unicode in Python 3) into UTF-8 so it could be hashed.

    With the changes, running example.py works just fine, but let me know if you want the commits cleaned up a bit.

    opened by nicktimko 11
  • Song recognition from file recorded with smartphone

    Song recognition from file recorded with smartphone

    Hey! Dejavu Is great, but i'm facing problems when I try to recognise songs with audio files recorded with smartphone.

    Dejavu find a Song but it's never the right Song, so Is there a way to stream audio data from android to dejavu so I can use micRecognition? (I'm developing a phonegap app) Thank so much!

    opened by caveandre 8
  • How alignment works?

    How alignment works?

    Hi Dejavu is a great project but I am unable to understand how it is able to calculate relative offset for a segment of the sound. According to documentation following formula is used:

    difference = database offset from original track - sample offset from recording

    To my understanding offset is difference in time between the actual starting of song and the starting of the segment of the song. I know something is wrong.

    How relative offset is been calculated?

    opened by pk97 8
  • Failed to solve

    Failed to solve

    8E106784-E47D-44DA-B303-AEAEF0CF49F4 I finished the line 18 already but when the line 21 i write the command it show [fail to solve.]in the terminal. How i can solve this? i try to install docker (in raspberry pi 4) what should i do ? In readme

    opened by NormanAnderson42 1
  • Do maximum_filter with cupy instead of scipy

    Do maximum_filter with cupy instead of scipy

    https://github.com/worldveil/dejavu/blob/d2b8761eb39f8e2479503f936e9f9948addea8ea/dejavu/fingerprint.py#L98

    Replace the above codes with following codes.

    import cupy as cp
    from cupyx.scipy.ndimage import maximum_filter as cp_maximum_filter
    array = cp.array(arr2D)
    local_max = cp.asnumpy(cp_maximum_filter(array, footprint=cp.array(neighborhood)) == array)
    del array
    

    |Single Channel length|Audio Duration|cupy|scipy| | :---: | :---: | :---: | :---: | |62622441| 24 min|45.92s |49.69s | |31311220| 12 min |10.6s |25.68s | |15655610| 6 min |1.56s|12.62s | |7827805| 3 min |0.72s |6.18s | |3913902| 1.5 min |0.34s |3.09s |

    Environment: Ryzen 7 5800H, RTX3060 6G Mobile, Windows 10 21H2, Python 3.7.9, CUDA 11.0, cupy-cuda110

    Which means that cupy will save up to 10 seconds when processing a dual-channel audio for 3 minutes. Meanwhile it will cost up to 1.5GB of video memory for an audio for 24 minutes. https://cupy.dev/

    opened by DeltaFlyerW 0
  • Trying to fingerprint about 200 000 files. After 15000 files INSERT operation is very slow.

    Trying to fingerprint about 200 000 files. After 15000 files INSERT operation is very slow.

    Hi all. I stuck when trying to fingerpint big database of music. Becouse of the billions of indexes generated, INSERT operation take super big amount of time. In example first 3000 files on emtpy database (i use postgres) been ready in 3 hours. But after 15000 files added another 3000 files takes more than 10 hours, and looks like time will grow exponencialy. Is there any tweaks to hussle with indexes or maybe it is possible to run dejavu without indexes?

    Cheers Denis

    opened by unbrokendub 1
  • run_tests.py: error: the following arguments are required src

    run_tests.py: error: the following arguments are required src

    I’m a college student trying to use dejavu package for our team 4th year project. I’m having this “src” problem that doesn’t know how to deal with, any clue? 7E6D2C22-EA40-4F24-96A3-878C3CEA581F

    opened by FlupFlip 4
  • Possibility of partial match

    Possibility of partial match

    Hi. I just wanted to know if it's possible use this project to scan the ambient's sound and send an alarm when it match just only a percentage and not 100%.

    I'll try to explain my request: my son is epileptic and all the doctor says that the monitoring systems for the seizure during the night are not so reliable, the only thing that helps us to be alerted of a epileptic seizure starting is a baby monitor, because the sound of the breathing changes in a very particular way during an epileptic seizure. My idea is this: if I record the sound of the breathing of my son during some epileptic seizure (let's say 20 times) and I put that files in the dejavu database, it is possibile that dejavu will find a match, even if it's not a 100% match? And it's possible to keep dejavu active during the night and make it sending alerts in that cases?

    If I didn't explain weel the problem tell me and I'll try to explain it better. Thank you.

    opened by damares86 5
Owner
Will Drevo
Will Drevo
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 359 Feb 15, 2021
convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

null 4 Dec 21, 2022
Audio spatialization over WebRTC and JACK Audio Connection Kit

Audio spatialization over WebRTC Spatify provides a framework for building multichannel installations using WebRTC.

Bruno Gola 34 Jun 29, 2022
Audio augmentations library for PyTorch for audio in the time-domain

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Janne 166 Jan 8, 2023
praudio provides audio preprocessing framework for Deep Learning audio applications

praudio provides objects and a script for performing complex preprocessing operations on entire audio datasets with one command.

Valerio Velardo 105 Dec 26, 2022
Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Auditory Slow-Fast This repository implements the model proposed in the paper: Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fa

Evangelos Kazakos 57 Dec 7, 2022
C++ library for audio and music analysis, description and synthesis, including Python bindings

Essentia Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license.

Music Technology Group - Universitat Pompeu Fabra 2.3k Jan 3, 2023
An app made in Python using the PyTube and Tkinter libraries to download videos and MP3 audio.

yt-dl (GUI Edition) An app made in Python using the PyTube and Tkinter libraries to download videos and MP3 audio. How do I download this? Windows: Fi

null 1 Oct 23, 2021
Python library for audio and music analysis

librosa A python package for music and audio analysis. Documentation See https://librosa.org/doc/ for a complete reference manual and introductory tut

librosa 5.6k Jan 6, 2023
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple

Theodoros Giannakopoulos 5.1k Jan 2, 2023
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple

Theodoros Giannakopoulos 3.8k Feb 17, 2021
Python audio and music signal processing library

madmom Madmom is an audio signal processing library written in Python with a strong focus on music information retrieval (MIR) tasks. The library is i

Institute of Computational Perception 1k Dec 26, 2022
A Python 3 script for capturing and recording a SDR stream to a WAV file (or serving it to a HTTP audio stream).

rfsoapyfile A Python 3 script for capturing and recording a SDR stream to a WAV file (or serving it to a HTTP audio stream). The script is threaded fo

null 4 Dec 19, 2022
Scalable audio processing framework written in Python with a RESTful API

TimeSide : scalable audio processing framework and server written in Python TimeSide is a python framework enabling low and high level audio analysis,

Parisson 340 Jan 4, 2023
Python module for handling audio metadata

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg The

Quod Libet 1.1k Dec 31, 2022
Python I/O for STEM audio files

stempeg = stems + ffmpeg Python package to read and write STEM audio files. Technically, stems are audio containers that combine multiple audio stream

Fabian-Robert Stöter 72 Dec 23, 2022
Python library for handling audio datasets.

AUDIOMATE Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a gener

Matthias 121 Nov 27, 2022
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Audiomentations A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio a

Iver Jordal 1.2k Jan 7, 2023
spafe: Simplified Python Audio-Features Extraction

spafe aims to simplify features extractions from mono audio files. The library can extract of the following features: BFCC, LFCC, LPC, LPCC, MFCC, IMFCC, MSRCC, NGCC, PNCC, PSRCC, PLP, RPLP, Frequency-stats etc. It also provides various filterbank modules (Mel, Bark and Gammatone filterbanks) and other spectral statistics.

Ayoub Malek 310 Jan 1, 2023