Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Overview

A Python library for audio feature extraction, classification, segmentation and applications

This doc contains general info. Click here for the complete wiki. For a more generic intro to audio data handling read this article

News

General

pyAudioAnalysis is a Python library covering a wide range of audio analysis tasks. Through pyAudioAnalysis you can:

  • Extract audio features and representations (e.g. mfccs, spectrogram, chromagram)
  • Train, parameter tune and evaluate classifiers of audio segments
  • Classify unknown sounds
  • Detect audio events and exclude silence periods from long recordings
  • Perform supervised segmentation (joint segmentation - classification)
  • Perform unsupervised segmentation (e.g. speaker diarization) and extract audio thumbnails
  • Train and use audio regression models (example application: emotion recognition)
  • Apply dimensionality reduction to visualize audio data and content similarities

Installation

  • Clone the source of this library: git clone https://github.com/tyiannak/pyAudioAnalysis.git
  • Install dependencies: pip install -r ./requirements.txt
  • Install using pip: pip install -e .

An audio classification example

More examples and detailed tutorials can be found at the wiki

pyAudioAnalysis provides easy-to-call wrappers to execute audio analysis tasks. Eg, this code first trains an audio segment classifier, given a set of WAV files stored in folders (each folder representing a different class) and then the trained classifier is used to classify an unknown audio WAV file

from pyAudioAnalysis import audioTrainTest as aT
aT.extract_features_and_train(["classifierData/music","classifierData/speech"], 1.0, 1.0, aT.shortTermWindow, aT.shortTermStep, "svm", "svmSMtemp", False)
aT.file_classification("data/doremi.wav", "svmSMtemp","svm")

Result: (0.0, array([ 0.90156761, 0.09843239]), ['music', 'speech'])

In addition, command-line support is provided for all functionalities. E.g. the following command extracts the spectrogram of an audio signal stored in a WAV file: python audioAnalysis.py fileSpectrogram -i data/doremi.wav

Further reading

Apart from this README file, to bettern understand how to use this library one should read the following:

@article{giannakopoulos2015pyaudioanalysis,
  title={pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis},
  author={Giannakopoulos, Theodoros},
  journal={PloS one},
  volume={10},
  number={12},
  year={2015},
  publisher={Public Library of Science}
}

For Matlab-related audio analysis material check this book.

Author

Theodoros Giannakopoulos, Principal Researcher of Multimodal Machine Learning at the Multimedia Analysis Group of the Computational Intelligence Lab (MagCIL) of the Institute of Informatics and Telecommunications, of the National Center for Scientific Research "Demokritos"

Comments
  • "tuple index out of range" when training regression

    When I try to run featureAndTrainRegression I get this:

    File "tbtrainer_regression.py", line 3, in <module>
        att.featureAndTrainRegression('wav',1.0,1.0,att.shortTermWindow,att.shortTermStep,'svm','mesto',True)
      File "/home/oliver/Documents/pyAudioAnalysis/pyAudioAnalysis/audioTrainTest.py", line 433, in featureAndTrainRegression
        numOfFeatures = featuresFinal[0].shape[1]
    IndexError: tuple index out of range
    

    This is after it finishes analyzing the 15 files in the directory. featureAndTrain works fine for me.

    opened by NaturalFigurehead 7
  • Import Error- No module named eyeD3

    Import Error- No module named eyeD3

    I am trying to use pyAudioAnalysis but am facing an issue for long i.e. no module named eyeD3. I have all libraries ( including eyeD3) already installed and there isn't any directory issue probably. I am using Python 2.7.13 for the purpose. from pyAudioAnalysis import XYZ whatever I write here, it shows import error in eyeD3. Would be glad for your help in this regard.

    opened by joyjitchatterjee 7
  • setup.py

    setup.py

    This could probably do with a setup.py.

    To make this work, everything should be moved into a subdirectory.

    Is there a prefered way to install pyAudioAnalysis currently ?

    opened by stuaxo 7
  • problem with nchroma in stChromaFeatures

    problem with nchroma in stChromaFeatures

    Hi, I got the following error : Traceback (most recent call last): File "/home/jmw/Bureau/bacasable/musiques-stream/testpydub.py", line 42, in F = pafe.stFeatureExtraction(mysample[:2*w], Fs, w, pas) File "/home/jmw/Bureau/bacasable/musiques-stream/pyAudioAnalysis-master/pyAudioAnalysis/audioFeatureExtraction.py", line 591, in stFeatureExtraction chromaNames, chromaF = stChromaFeatures(X, Fs, nChroma, nFreqsPerChroma) File "/home/jmw/Bureau/bacasable/musiques-stream/pyAudioAnalysis-master/pyAudioAnalysis/audioFeatureExtraction.py", line 283, in stChromaFeatures print "nFreqsPerChroma[nChroma]",nFreqsPerChroma[nChroma] IndexError: index 56 is out of bounds for axis 1 with size 55 if the windows is small, e.g. below the max(nChroma), in my case the Win size is 110 (10ms with Fs=11000), half window is 55 then C /= nFreqsPerChroma[nChroma] crashes.

    This error happens only with a small window, e.g. Fs=11000 with 10ms window so I set Win=110 samples and half window =55 for FFT.

    I put a few prints in the code and saw that he nChroma table contains 92 samples) so that the code : C /= nFreqsPerChroma[nChroma] in stChromaFeatures crashes. This is because the nFreqPerChroma array is of size nChroma (=55) but nChroma contains values larger than 55. I think you should have a look at this code. best regards

    opened by jmw67 7
  • Compatible with both Python 2 and 3

    Compatible with both Python 2 and 3

    Hello,

    I have modified the code to be compatible with both Python2 and Python3. The following are all the changes I made:

    • Ran the codes through a Pep8 Linter to make code compatible with pep8 (spaces to tabs, spaces around arithmetic operators, etc.)
    • Changed "import cPickle" to "from six.moves import cPickle" to be compatible with Python 2 and 3
    • Change print statements from "print 'blah blah'" to "print("blah blah")"
    • Changed order of imports to alphabetical in every code
    • Changed some divisions from '/' to '//' so as to get integer output, to be compatible with both python versions
    • Added requirements.txt files for Python2 and Python3
    • Appended .gitignore file with lines from git's standard .gitignore file for Python

    I have tested the code for both Python2 and Python3, seems to be working well. I ran the same codes as mentioned in your Wiki and got the same files saved and the same graphs generated.

    Let me know if you face any problems! :)

    Regards, Vikram

    opened by voletiv 6
  • ValueError: operands could not be broadcast together with shapes (5,5) (4,4)

    ValueError: operands could not be broadcast together with shapes (5,5) (4,4)

    I am geting the following error when running aT.extract_features_and_train(["<path>\Audio Classification\BabyCry","<path>\Audio Classification\GlassBreaking", "<path>\Audio Classification\Gunshot", "<path>\Audio Classification\Falling", "<path>\Audio Classification\Background"], 1.0, 1.0, aT.shortTermWindow, aT.shortTermStep, "randomforest", "randomForest2", False). All the data in the folders are .wav files

    <more outputs>
    Feature extraction complexity ratio: 26.6 x realtime
    Analyzing file 1 of 3: <path>\Audio Classification\Background\2019-05-24-12-26-34-355000__1.wav
    Analyzing file 2 of 3: <path>\Audio Classification\Background\2019-05-24-12-26-34-355000__22.wav
    Analyzing file 3 of 3:<path>\Audio Classification\Background\2019-05-24-12-26-34-355000__24.wav
    Feature extraction complexity ratio: 25.8 x realtime
    Param = 10.00000 - classifier Evaluation Experiment 1 of 26
    
    Traceback (most recent call last):
      File "<path>\main.py", line 3, in <module>
        aT.extract_features_and_train(["<path>\Audio Classification\BabyCry","<path>\GlassBreaking", "D:\Research\Datasets\Audio Classification\Gunshot", "<path>\Audio Classification\Falling", "<path>\Audio Classification\Background"], 1.0, 1.0, aT.shortTermWindow, aT.shortTermStep, "randomforest", "randomForest2", False)
      File "<path>t\anaconda3\lib\site-packages\pyAudioAnalysis\audioTrainTest.py", line 298, in extract_features_and_train
        best_param = evaluate_classifier(features, class_names, classifier_type,
      File "<path>\anaconda3\lib\site-packages\pyAudioAnalysis\audioTrainTest.py", line 623, in evaluate_classifier
        cm = cm + cmt
    ValueError: operands could not be broadcast together with shapes (5,5) (4,4) 
    
    opened by MiPlayer123 5
  • Non-music detection training fails

    Non-music detection training fails

    I'm trying to make a classifier with pyAudioAnalysis to detect non-music sections of songs. When I set everything up and run it, the trainer blows through every step in about 5 seconds total. It neither learns, nor changes with different parameters. If 40% of the data is music, it gets 40% wrong. It simply chooses to predict either music or non-music for every piece of data. Any insight into what might cause this?

    opened by tomjl 5
  • eyeD3 version

    eyeD3 version

    this project is amazing! I have tried the speaker diarization feature and it really works! Super Cool! However it is also worthwhile to specify the version of eyeD3 being used.. While I was installing this package, there was an error on eyeD3 module being used. The package will not work with the latest eyeD3 of versions 0.7.x but with versions 0.6.x.

    Hope this project will be added in PyPi. The project is promising!

    opened by bninopaul 5
  • What window size to use for beat_extraction() ?

    What window size to use for beat_extraction() ?

    The beat_extraction function is covered in the MidTermFeatures.py. It requires two arguments: the short_features and the window size in seconds. Maybe it is since I am new to audio analysis, but I've been struggling to figure out what the right window size should be... Usually I encounter this error: ValueError: attempt to get argmax of an empty sequence or in other cases the estimated bpm always ends up as 60 The term "window size" seems ambiguous to me. I thought it could be either of two things: (1) The length of the audio snippet for which I want to estimate the bpm or (2) the short-term window size I have used previously to get the short term features. For reference, this is how I am extracting the short term features. From my understanding the window size in this case would be 1 second.

    [Fs, x] = audioBasicIO.read_audio_file(path)
    x = audioBasicIO.stereo_to_mono(x)
    # for reference: feature_extraction(signal, sampling_rate, window, step, deltas=True)
    f_names = ShortTermFeatures.feature_extraction(x, Fs, 1 * Fs, 1 * Fs)
    
    opened by ThuongCroud 4
  • Out of date wiki

    Out of date wiki

    On the segmentation wiki page, there are code examples which seem to be from an earlier version (pre-refactoring).

    To bring these up to date, references to the following functions should be updated

    1. mtFileClassification() -> mid_term_file_classification()
    2. mtFeatureExtraction() -> mid_feature_extraction()
    3. trainHMM_fromFile -> train_hmm_from_file()
    4. trainHMM_fromDir -> train_hmm_from_dir()
    5. hmmSegmentation() -> hmm_segmentation()
    6. evaluateSegmentationClassificationDir() -> evaluate_segmentation_classification_dir()
    7. aS.silenceRemoval(x, Fs, 0.020, 0.020, smoothWindow = 1.0, Weight = 0.3, plot = True) -> `segments = aS.silence_removal(x, Fs, 0.020, 0.020, smooth_window = 1.0, weight = 0.3, plot = True)

    I may have missed a couple. I would have PRed but it seems like I can't for wiki pages. Great library!

    opened by HPrickettMorgan 4
  • How to export diarization data to a file?

    How to export diarization data to a file?

    #54 addressed this briefly, to "use flags2segs" but I'm not sure where/when/how is the best way to put that function call. Could someone walk through what that process would look like?

    opened by ossim 4
  • AttributeError: module 'pyAudioAnalysis.audioTrainTest' has no attribute 'featureAndTrain'

    AttributeError: module 'pyAudioAnalysis.audioTrainTest' has no attribute 'featureAndTrain'

    Hello, I am having the error bellow while running my program. I believe there have been some updates on_ PyAudioAnalysis

    AttributeError: module 'pyAudioAnalysis.audioTrainTest' has no attribute 'featureAndTrain'

    I am wondering if there is a possibility to update this script:

    from pyAudioAnalysis import audioTrainTest as aT aT.featureAndTrain("...")

    Thank you :)

    opened by Emi77H 0
  • Plot the wav image Value Error on silence_removal

    Plot the wav image Value Error on silence_removal

    image

    Try this wave sound

    https://drive.google.com/file/d/1IMnX38HGDSl7aweJoMygqIgIOizdhXF0/view?usp=share_link

    Reference https://github.com/tyiannak/pyAudioAnalysis/blob/master/pyAudioAnalysis/audioSegmentation.py Line 798

    image

    Please help

    opened by mengtongun 0
  • Fix for Issue 376 about read_audio_generic's signal shape

    Fix for Issue 376 about read_audio_generic's signal shape

    This pull request fixes issue #376 . I know you should use read_audio_file and it will call read_audio_generic if needed and then flatten the signal, but one might end up using the generic function by discovering its existence from hints in an environment such as jupyter notebooks.
    On the other hand this does not break the current behavior of the read_audio_file -> read_audio_generic option.

    opened by FrancescoManfredi 0
  • Signal from read_audio_generic causes ValueError in feature_extraction

    Signal from read_audio_generic causes ValueError in feature_extraction

    What is the problem

    The feature_extraction function requires the signal to be of shape (m, ) and fails when given a signal of shape (m, 1).
    Try running the following code as a test:

    from pyAudioAnalysis import ShortTermFeatures as aF
    from pyAudioAnalysis import audioBasicIO as aIO
    import numpy as np
    Fs, s = aIO.read_audio_file("data/mio_audio.wav")
    print(s.shape)
    Fs2, s2 = aIO.read_audio_generic("data/audio_test.mp3")
    print(s2.shape)
    # extracting features directly from the first signal is ok
    _, _ = aF.feature_extraction(s, Fs, 500, 500, deltas=False)
    # extracting features from the second requires reshaping
    # causes
    # ValueError: shapes (250,1) and (250,40) not aligned: 1 (dim 1) != 250 (dim 0)
    _, _ = aF.feature_extraction(s2, Fs2, 500, 500, deltas=False)
    # reshaping to (m, ) fixes the issue
    s2 = s2.reshape((s2.shape[0], ))
    _, _ = aF.feature_extraction(s2, Fs2, 500, 500, deltas=False)
    

    Why this matters

    This is not a critical flaw but the inconsistency is annoying as it might be hard to spot when working with read_audio_generic.

    Proposed fix

    Add a check at the beginning of feature_extraction to detect signals in the form (m, 1) and reshape them to (m, ).
    OR
    Return a signal with the same shape as read_audio_file from read_audio_generic.

    opened by FrancescoManfredi 0
  • docs: Fix a few typos

    docs: Fix a few typos

    There are small typos in:

    • pyAudioAnalysis/ShortTermFeatures.py
    • pyAudioAnalysis/audioSegmentation.py
    • pyAudioAnalysis/audioTrainTest.py

    Fixes:

    • Should read plotted rather than ploted.
    • Should read optional rather than optinal.
    • Should read exactly rather than exacty.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 0
Owner
Theodoros Giannakopoulos
Principal Researcher of Multimodal Machine Learning
Theodoros Giannakopoulos
LibXtract is a simple, portable, lightweight library of audio feature extraction functions.

LibXtract LibXtract is a simple, portable, lightweight library of audio feature extraction functions. The purpose of the library is to provide a relat

Jamie Bullock 215 Nov 16, 2022
praudio provides audio preprocessing framework for Deep Learning audio applications

praudio provides objects and a script for performing complex preprocessing operations on entire audio datasets with one command.

Valerio Velardo 105 Dec 26, 2022
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 419 Dec 26, 2022
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 359 Feb 15, 2021
Audio augmentations library for PyTorch for audio in the time-domain

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Janne 166 Jan 8, 2023
spafe: Simplified Python Audio-Features Extraction

spafe aims to simplify features extractions from mono audio files. The library can extract of the following features: BFCC, LFCC, LPC, LPCC, MFCC, IMFCC, MSRCC, NGCC, PNCC, PSRCC, PLP, RPLP, Frequency-stats etc. It also provides various filterbank modules (Mel, Bark and Gammatone filterbanks) and other spectral statistics.

Ayoub Malek 310 Jan 1, 2023
C++ library for audio and music analysis, description and synthesis, including Python bindings

Essentia Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license.

Music Technology Group - Universitat Pompeu Fabra 2.3k Jan 3, 2023
Audio features extraction

Yaafe Yet Another Audio Feature Extractor Build status Branch master : Branch dev : Anaconda : Install Conda Yaafe can be easily install with conda. T

Yaafe 231 Dec 26, 2022
Python library for audio and music analysis

librosa A python package for music and audio analysis. Documentation See https://librosa.org/doc/ for a complete reference manual and introductory tut

librosa 5.6k Jan 6, 2023
convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

null 4 Dec 21, 2022
Audio spatialization over WebRTC and JACK Audio Connection Kit

Audio spatialization over WebRTC Spatify provides a framework for building multichannel installations using WebRTC.

Bruno Gola 34 Jun 29, 2022
a library for audio and music analysis

aubio aubio is a library to label music and sounds. It listens to audio signals and attempts to detect events. For instance, when a drum is hit, at wh

aubio 2.9k Dec 30, 2022
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Summary Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the pack

Audiovisual Communications Laboratory 1k Jan 9, 2023
Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

Welcome to MARSYAS. MARSYAS is a software framework for rapid prototyping of audio applications, with flexibility and extensibility as primary concer

Marsyas Developers Group 364 Oct 31, 2022
Python audio and music signal processing library

madmom Madmom is an audio signal processing library written in Python with a strong focus on music information retrieval (MIR) tasks. The library is i

Institute of Computational Perception 1k Dec 26, 2022
Python library for handling audio datasets.

AUDIOMATE Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a gener

Matthias 121 Nov 27, 2022
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Audiomentations A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio a

Iver Jordal 1.2k Jan 7, 2023
pedalboard is a Python library for adding effects to audio.

pedalboard is a Python library for adding effects to audio. It supports a number of common audio effects out of the box, and also allows the use of VST3® and Audio Unit plugin formats for third-party effects.

Spotify 3.9k Jan 2, 2023
Audio library for modelling loudness

Loudness Loudness is a C++ library with Python bindings for modelling perceived loudness. The library consists of processing modules which can be casc

Dominic Ward 33 Oct 2, 2022