Python implementation of the Short Term Objective Intelligibility measure

Related tags

Audio pystoi
Overview

Python implementation of STOI

Implementation of the classical and extended Short Term Objective Intelligibility measures

Intelligibility measure which is highly correlated with the intelligibility of degraded speech signals, e.g., due to additive noise, single/multi-channel noise reduction, binary masking and vocoded speech as in CI simulations. The STOI-measure is intrusive, i.e., a function of the clean and degraded speech signals. STOI may be a good alternative to the speech intelligibility index (SII) or the speech transmission index (STI), when you are interested in the effect of nonlinear processing to noisy speech, e.g., noise reduction, binary masking algorithms, on speech intelligibility.
Description taken from Cees Taal's website

Install

pip install pystoi or pip3 install pystoi

Usage

import soundfile as sf
from pystoi import stoi

clean, fs = sf.read('path/to/clean/audio')
denoised, fs = sf.read('path/to/denoised/audio')

# Clean and den should have the same length, and be 1D
d = stoi(clean, denoised, fs, extended=False)

Matlab code & Testing

All the Matlab code in this repo is taken from or adapted from the code available here (STOI – Short-Time Objective Intelligibility Measure – ) written by Cees Taal.

Thanks to Cees Taal who open-sourced his Matlab implementation and enabled thorough testing of this python code.

If you want to run the tests, you will need Matlab, matlab.engine (install instructions here) and matlab_wrapper (install with pip install matlab_wrapper). The tests can only be ran under Python 2.7 as matlab.engine and matlab_wrapper are only compatible with Python2.7 Tests are passing at relative and absolute tolerance of 1e-3, which is enough for the considered application (all the variability is coming from the resampling method when signals are not natively sampled at 10kHz).

Very big thanks to @gauss256 who translated all the matlab scripts to Octave, and wrote all the tests for it!

Contribute

Any contribution are welcome~, specially to improve the execution speed of the code~ (thank you Przemek Pobrotyn for a 4x speed-up!) :

  • Improve the resampling method to match Matlab's resampling in tests/. This can be considered a solved issue thanks to @gauss256 !
  • Write tests for Python 3 (with transplant for example)

References

  • [1] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen 'A Short-Time Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech', ICASSP 2010, Texas, Dallas.
  • [2] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen 'An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech', IEEE Transactions on Audio, Speech, and Language Processing, 2011.
  • [3] J. Jensen and C. H. Taal, 'An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers', IEEE Transactions on Audio, Speech and Language Processing, 2016.
Comments
  • remove_silent_frames

    remove_silent_frames

    I think the way the signal is being reconstructed in remove_silent_frames is not quite right. The evidence for this is that if you start with a signal that has no silence, the output of remove_silent_frames should be the same as the input, but it's not.

    The problem is that the window does not satisfy the COLA constraint so the overlap-add technique needs to be modified to compensate.

    I know how to fix this but I wanted to get your opinion before preparing a PR. The issue is that the problem lies in the original STOI MATLAB code, not in pystoi. So if it is fixed, tests that compare to the output of MATLAB will fail.

    The magnitude of the error is probably not large, so it's a tradeoff. Is it better to have correct silence removal or consistency with MATLAB output?

    opened by gauss256 14
  • Vectorization

    Vectorization

    As per contribution request, in this pull request we add vectorisation of a number of computations, specfically:

    • Vectorization of nearly all of utils.remove_silent_frames
    • Vectorization of computations in both classical and extended STOI

    There are still a few for loops left in form of list comprehension, though.

    As a result, we obtain a nearly 4-fold speed up in computations of classical STOI and 1.5-fold speed up in computations of extended STOI.

    We also demonstrate that the results of vectorized implementation agree with the previous one, within tolerance. See documentation for np.allclose for details about tolerance threshold.

    The minuscule differences in results are most likely due to numerical inaccuracies when performing matrix operations vs doing computations in loops.

    See the images below for results and time comparisons.

    screen shot 2018-07-18 at 9 27 13 am screen shot 2018-07-18 at 9 27 40 am

    Before merging, please update the README file accordingly. Thank you!

    opened by PrzemekPobrotyn 7
  • TensorFlow

    TensorFlow

    I am working on a version of STOI in TensorFlow. I am posting this here just in case @mpariente or someone else is also working on that. We could avoid duplicating effort.

    opened by gauss256 7
  • Correct size of removed silence arrays

    Correct size of removed silence arrays

    The code was not removing silent frames properly. This was not caught in the unit tests because there are no silent frames in the test data.

    I have confirmed the commit in my own unit tests, but they are based on Octave rather than MATLAB and it would take some work (which I may yet do) to commit them.

    The unit test data consists of random numbers. To reproduce the problem and verify the fix, zero out the middle third of the data. Then compare the results of pystoi to MATLAB or Octave.

    opened by gauss256 6
  • AxisError when signal contains silence

    AxisError when signal contains silence

    The stoi function produces an error if a reference signal only contains a short piece of speech. This seems to be caused by the removal of silent frames.

    This is a minimal example using WSJ0-2mix data. Replace wsj0_2mix_root with the root to the WSJ0-2mix data. You might have to remove the suffix _2 if you have a newer version of the WJ0-2mix database:

    from pathlib import Path
    from pystoi.stoi import stoi
    import soundfile as sf
    
    wsj0_2mix_root = Path('<path to WSJ0-2mix root dir>')
    
    observation = sf.read(str(wsj0_2mix_root / 'data/2speakers/wav8k/min/cv/mix/40ba0112_1.2757_01nc0218_-1.2757.wav'))[0]
    target = sf.read(str(wsj0_2mix_root / 'data/2speakers/wav8k/min/cv/s2/40ba0112_1.2757_01nc0218_-1.2757_2.wav'))[0]
    
    stoi(target, observation, 8000)
    
    ---------------------------------------------------------------------------
    AxisError                                 Traceback (most recent call last)
    <ipython-input-167-eb5a1701f57b> in <module>
          9 
         10 
    ---> 11 stoi(target, observation, 8000)
    
    .../python3.7/site-packages/pystoi/stoi.py in stoi(x, y, fs_sig, extended)
         75         # Find normalization constants and normalize
         76         normalization_consts = (
    ---> 77             np.linalg.norm(x_segments, axis=2, keepdims=True) /
         78             (np.linalg.norm(y_segments, axis=2, keepdims=True) + utils.EPS))
         79         y_segments_normalized = y_segments * normalization_consts
    
    .../python3.7/site-packages/numpy/linalg/linalg.py in norm(x, ord, axis, keepdims)
       2479             # special case for speedup
       2480             s = (x.conj() * x).real
    -> 2481             return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
       2482         else:
       2483             try:
    
    AxisError: axis 2 is out of bounds for array of dimension 1
    

    Is this a bug in the implementation or a general flaw of the STOI metric? Do you have a suggestion on how to handle this issue?

    opened by thequilo 5
  • Is there any difference in Resample() between Matlab and Octave?

    Is there any difference in Resample() between Matlab and Octave?

    This code is really helpful for my study. Thank you for this awesome work! There is no issue I am going to raise. Just some questions about MATLAB and Octave.

    The sample rate of my audio is 16000. According to the README file, the test will fail if I use python to do the resample. My question is how about the resample() function between MATLAB and Octave? Are they equivalent? It will be very appreciated if someone could answer this.

    opened by NearLinHere 5
  • Numpy.dtype Size change

    Numpy.dtype Size change

    Is this the expected error message when I run the function with 16,000 Hz wav files, as opposed to 10kHz?

    RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88

    opened by christianhwa 5
  • Introduce a faster overlap_and_add method

    Introduce a faster overlap_and_add method

    This PR introduces a new overlap_and_add method, inspired by the one implemented in tensorflow (https://github.com/tensorflow/tensorflow/blob/v2.7.0/tensorflow/python/ops/signal/reconstruction_ops.py#L30-L167) , that is ~50 times faster than the previous one on a 30-seconds audio because it vectorises the code instead of making a for loop.

    opened by giamic 3
  • Is resampling really required?

    Is resampling really required?

    Hi, The original paper (http://cas.et.tudelft.nl/pubs/Taal2010.pdf) mentions in the start of section 2 that the metric is supposed to be used on audio at a sampling rate of 10000 Hz. Is this really necessary? I get fairly similar results regardless of whether or not I resample my audio.

    Thanks!

    opened by anujstam 3
  • some int casts and prevented a log of 0

    some int casts and prevented a log of 0

    As mentioned, very nice that you did a port to Python. I tried it in Python3 and had to make some small tweaks. I did not run the tests because I don't have a Matlab license ( another very good reason we have a Python version :) ).

    opened by chtaal 3
  • Ability for batched tensor

    Ability for batched tensor

    Thank you for the code! I have a question: Does this code has the ability to calc the stoi for batched tensor with size of [B, num_of_samples] or even with size of [B, num_speaker, num_of_samples]. I checked and I think it has not this ability, right? Can you maybe expand the implementaion to this scenario please?

    opened by MordehayM 2
  • Future warnings raised

    Future warnings raised

    Hi, I'm running stoi(signal1, signal2, sr, extended=True) where signal1 and signal2 are both numpy.ndarray

    and I'm getting the following future warning: /usr/lib/python3/dist-packages/scipy/signal/signaltools.py:2383: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result. return y[keep]

    Any idea how to avoid this from happening?

    Thanks

    opened by m-mandel 1
  • Batch vectorisation

    Batch vectorisation

    Allow users to pass batches of audio waveforms, vectorising the relevant code.

    This adds flexibility to the code and also makes it faster to run on batched data. There are still probably a few points in which we could optimise even more, but this should do for a first iteration.

    I have some tests on my local machine that seem to show that the output hasn't changed. They are not in the PR because I didn't want to keep two versions of every function I changed, it felt confusing and ultimately useless.

    Unfortunately I'm not able to run the octave / matlab tests locally to verify if they still pass.

    opened by giamic 1
  • Exception For Small Inputs

    Exception For Small Inputs

    Currently pystoi.stoi doesn't support small inputs, but throws a non indicative error:

    In [28]:  pystoi.stoi(np.arange(100), np.arange(100), 32000, extended=False)
    ---------------------------------------------------------------------------
    AxisError                                 Traceback (most recent call last)
    <ipython-input-28-3f8d814254e5> in <module>
    ----> 1 pystoi.stoi(np.arange(100), np.arange(100), 32000, extended=False)
    
    ~/venv/py3/lib/python3.7/site-packages/pystoi/stoi.py in stoi(x, y, fs_sig, extended)
         56 
         57     # Remove silent frames
    ---> 58     x, y = utils.remove_silent_frames(x, y, DYN_RANGE, N_FRAME, int(N_FRAME/2))
         59 
         60     # Take STFT
    
    ~/venv/py3/lib/python3.7/site-packages/pystoi/utils.py in remove_silent_frames(x, y, dyn_range, framelen, hop)
        122 
        123     # Compute energies in dB
    --> 124     x_energies = 20 * np.log10(np.linalg.norm(x_frames, axis=1) + EPS)
        125 
        126     # Find boolean mask of energies lower than dynamic_range dB
    
    <__array_function__ internals> in norm(*args, **kwargs)
    
    ~/venv/py3/lib/python3.7/site-packages/numpy-1.19.2-py3.7-linux-x86_64.egg/numpy/linalg/linalg.py in norm(x, ord, axis, keepdims)
       2559             # special case for speedup
       2560             s = (x.conj() * x).real
    -> 2561             return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
       2562         # None of the str-type keywords for ord ('fro', 'nuc')
       2563         # are valid for vectors
    
    AxisError: axis 1 is out of bounds for array of dimension 1
    
    opened by hovavalon 5
  • Weird STOI Output

    Weird STOI Output

    Hi,

    Recently I was trying to evaluate some signals by calculating the stoi of each signals with this package. I used pystoi.stoi.stoifunction to calculate the stoi. When I input two identical signals as ref_signal and processed_signal, it output 1 perfectly. However, when I replaced processed signal with microphone signals I recorded with and without background music playing, it turned out that the STOI of the signal when background music was presented is always higher, which made no sense. I'm wondering if I'm using the function the wrong way or is there anything wrong with my audio file or understanding about STOI.

    I've uploaded my audio files at the following website as well as my code to evaluate STOI. https://github.com/nanaChang/stoiCheckFile

    Thank you!

    opened by nanaChang 7
Owner
Pariente Manuel
Audio researcher
Pariente Manuel
Supysonic is a Python implementation of the Subsonic server API.

Supysonic Supysonic is a Python implementation of the Subsonic server API. Current supported features are: browsing (by folders or tags) streaming of

Alban 228 Nov 19, 2022
A fast MDCT implementation using SciPy and FFTs

MDCT A fast MDCT implementation using SciPy and FFTs Installation As usual pip install mdct Dependencies NumPy SciPy STFT Usage import mdct spectrum

Nils Werner 43 Sep 2, 2022
Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Auditory Slow-Fast This repository implements the model proposed in the paper: Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fa

Evangelos Kazakos 57 Dec 7, 2022
Algorithmic and AI MIDI Drums Generator Implementation

Algorithmic and AI MIDI Drums Generator Implementation

Tegridy Code 8 Dec 30, 2022
Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Y-Net Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021 Project page: ipcv.github.io

Juan F. Montesinos 12 Oct 22, 2022
Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Y-Net Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021 Project page: ipcv.github.io

Juan F. Montesinos 12 Oct 22, 2022
Music Streaming Platform based on full implementation of DBSM

Symphony Music Streaming Platform based on full implementation of DBSM List of Commands Insert User (INSERT) Function to implement input in USER Get a

Parth Maradia 1 Nov 12, 2021
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 419 Dec 26, 2022
Audio fingerprinting and recognition in Python

dejavu Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here: How it works Dejavu can memorize audio by liste

Will Drevo 6k Jan 6, 2023
Python library for audio and music analysis

librosa A python package for music and audio analysis. Documentation See https://librosa.org/doc/ for a complete reference manual and introductory tut

librosa 5.6k Jan 6, 2023
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple

Theodoros Giannakopoulos 5.1k Jan 2, 2023
Scalable audio processing framework written in Python with a RESTful API

TimeSide : scalable audio processing framework and server written in Python TimeSide is a python framework enabling low and high level audio analysis,

Parisson 340 Jan 4, 2023
nicfit 425 Jan 1, 2023
Python module for handling audio metadata

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg The

Quod Libet 1.1k Dec 31, 2022
Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag

Tom Wallroth 577 Dec 26, 2022
Telegram Voice-Chat Bot Written In Python Using Pyrogram.

Telegram Voice-Chat Bot Telegram Voice-Chat Bot To Play Music From Various Sources In Your Group Support All linux based os. Windows Mac Diagram Requi

TheHamkerCat 314 Dec 29, 2022
Expressive Digital Signal Processing (DSP) package for Python

AudioLazy Development Last release PyPI status Real-Time Expressive Digital Signal Processing (DSP) Package for Python! Laziness and object representa

Danilo de Jesus da Silva Bellini 642 Dec 26, 2022
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 359 Feb 15, 2021
Python wrapper around sox.

pysox Python wrapper around sox. Read the Docs here. This library was presented in the following paper: R. M. Bittner, E. J. Humphrey and J. P. Bello,

Rachel Bittner 446 Dec 7, 2022