OpenL3: Open-source deep audio and image embeddings

Overview

OpenL3

OpenL3 is an open-source Python library for computing deep audio and image embeddings.

PyPI MIT license Build Status Coverage Status Documentation Status Downloads

Please refer to the documentation for detailed instructions and examples.

UPDATE: Openl3 now has Tensorflow 2 support!

The audio and image embedding models provided here are published as part of [1], and are based on the Look, Listen and Learn approach [2]. For details about the embedding models and how they were trained, please see:

Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.

Installing OpenL3

Dependencies

libsndfile

OpenL3 depends on the pysoundfile module to load audio files, which depends on the non-Python library libsndfile. On Windows and macOS, these will be installed via pip and you can therefore skip this step. However, on Linux this must be installed manually via your platform's package manager. For Debian-based distributions (such as Ubuntu), this can be done by simply running

apt-get install libsndfile1

Alternatively, if you are using conda, you can install libsndfile simply by running

conda install -c conda-forge libsndfile

For more detailed information, please consult the pysoundfile installation documentation.

Tensorflow

Starting with openl3>=0.4.0, Openl3 has been upgraded to use Tensorflow 2. Because Tensorflow 2 and higher now includes GPU support, tensorflow>=2.0.0 is included as a dependency and no longer needs to be installed separately.

If you are interested in using Tensorflow 1.x, please install using pip install 'openl3<=0.3.1'.

Tensorflow 1x & OpenL3 <= v0.3.1

Because Tensorflow 1.x comes in CPU-only and GPU variants, we leave it up to the user to install the version that best fits their usecase.

On most platforms, either of the following commands should properly install Tensorflow:

pip install "tensorflow<1.14" # CPU-only version
pip install "tensorflow-gpu<1.14" # GPU version

For more detailed information, please consult the Tensorflow installation documentation.

Installing OpenL3

The simplest way to install OpenL3 is by using pip, which will also install the additional required dependencies if needed. To install OpenL3 using pip, simply run

pip install openl3

To install the latest version of OpenL3 from source:

  1. Clone or pull the latest version, only retrieving the main branch to avoid downloading the branch where we store the model weight files (these will be properly downloaded during installation).

     git clone [email protected]:marl/openl3.git --branch main --single-branch
    
  2. Install using pip to handle python dependencies. The installation also downloads model files, which requires a stable network connection.

     cd openl3
     pip install -e .
    

Using OpenL3

To help you get started with OpenL3 please see the tutorial.

Acknowledging OpenL3

Please cite the following papers when using OpenL3 in your work:

[1] Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.

[2] Look, Listen and Learn
Relja Arandjelović and Andrew Zisserman
IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 2017.

Model Weights License

The model weights are made available under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Comments
  • Implement image embedding API

    Implement image embedding API

    Add the image embedding API to the library. This should be fairly similar to the existing audio API. I'll add a candidate interface once I've given it more thought.

    enhancement 
    opened by auroracramer 26
  • Refactor code and models to support TF 2.x and tf.keras

    Refactor code and models to support TF 2.x and tf.keras

    At some point in the somewhat near future, we should establish support for TF 2.x and tf.keras. The main reasons for this are:

    • To remain compatible with new releases of TF and Keras (the official version of which is now tf.keras) and make use of bug fixes, etc. some regression issues. As we have found (#42, #43), installing with newer versions of either package break installation and usage.
    • To address multiple vulnerabilities contained in tensorflow < 1.15.2.
    • To simplify the installation process; since TF 2.x now includes support for both CPU and GPU, we can now directly include tensorflow in the project dependencies, (as brought up in #39).

    A priori, it seems like the main things to do are:

    • Updating the dependencies in setup.py to include tensorflow
    • Modifying the model definitions to be tf.keras compatible
    • Porting the model files to a format that can be loaded by tf.keras with TF 2.x

    The main concern that comes to mind is the regression tests. We have already seen that tensorflow > 1.13 causes regression tests to fail. I imagine that this will only worsen as we introduce not only a new major release to TF, but also a divergence in Keras with tf.keras. @justinsalamon, what are your thoughts?

    opened by auroracramer 16
  • Add batch processing mode

    Add batch processing mode

    Something else to consider is a batch processing mode. i.e. making more efficient use of the GPU by predicting multiple files at once.

    Probably the least messy option would be to separate some of the interior code of get_audio_embedding for the case of audio into their own functions and make a get_audio_embedding_batch function that calls most of the same functions. We would also have a process_audio_file_batch function.

    I thought about changing get_audio_embedding so that it can either take in a single audio array, or a list of audio arrays (and probably a list of corresponding sample rates). While this might consolidate multiple usecases into one function, it'd probably get pretty messy so it's probably best we don't do this.

    Regarding the visual frame embedding extraction, we could ask the same question, though there might be more nuance depending on if we allow for individual images to be processed or not (I think we should). In the case of videos though, multiple frames are already being provided at once. So it raises a question (to me at least) whether we allow for get_vframe_embedding (as I'm currently calling it) should support both a single frame as well as multiple. This also raises the question of whether we allow for frames of multiple sizes or not.

    Thoughts?

    opened by auroracramer 10
  • tensorflow 2.1 doesn't require separate pip installes for gpu and cpu

    tensorflow 2.1 doesn't require separate pip installes for gpu and cpu

    Thanks for this great package! We love to use it!

    You state

    Because Tensorflow comes in CPU-only and GPU variants, we leave it up to the user to install the version that best fits their usecase.

    This is not the case anymore in 2.1 so you could (if 2.1 is supported) make tensorflow part of the standard requirements.

    opened by faroit 8
  • skimage submodules not imported correctly, regression tests fail

    skimage submodules not imported correctly, regression tests fail

    skimage uses lazy imports, so we need to import each submodule explictly (e.g. import skimage.transform; skimage.transform.rescale(X, s) instead of import skimage; skimage.transform.rescale(X, s)).

    opened by auroracramer 7
  • Output file format and naming convention

    Output file format and naming convention

    I have some questions about how to deal with embedding outputs:

    • Should we include the timestamps? If so do we save it in the same file?
    • What format should we use?
      • h5: Nice compression options, but since these typically shouldn't be large, it might be more annoying to deal with than other options
      • npy/npz: Standard approach, can easily load numpy arrays directly
      • JAMS: Using JAMs would help expand its use and would have a natural way to associate the timestamps with each embedding, but storing all of the values as text might be cumbersome and make the files big, especially if they are long
    • Should we use the embedding type to name the embedding? e.g. example_audio_openl3_6144emb_linear_music.<ext> Or should we just keep it simple?
      • It might be good if the user is comparing different embeddings, but it might be cumbersome if people just want to use a single type of embeddings. Of course we could add an option for this, but adding another option for something like this might be excessive.
    opened by auroracramer 6
  • Fix API documentation and build

    Fix API documentation and build

    Fix image embedding size in load_image_embedding_model() docstring, mock missing tensorflow.keras modules in doc/conf.py to fix API documentation build, and remove pin on sphinx version. Addresses #60 and #71.

    opened by auroracramer 4
  • Openl3 0.4.0 - Support for Tensorflow 2

    Openl3 0.4.0 - Support for Tensorflow 2

    Figured I would push this out while I'm waiting for something else to build.

    Related PR containing updated models: https://github.com/marl/openl3/pull/61

    Setup Changes

    • Openl3 now requires tensorflow>=2.0.0 and installs it by default (there is no longer a separate GPU package)
    • Now requires kapre>=0.3.5 - TODO: make sure we have the exact minimum kapre version - I remember checking git blame, but haven't tested anything
    • keras as a standalone package was removed from dependencies (we're using tf.keras)
    • travis.yml: removed python 2.7 & 3.5 and added 3.7 & 3.8 since tensorflow only supports 3.6-3.8
      • needed to install Cython first for python 3.8 in order to install skimage (RuntimeError: Cython >= 0.23.4 is required to build scikit-image from git checkout)

    Doc Changes

    • Changed tensorflow dependency message to reflect updates
    • Added "Choosing an Audio Frontend (CPU / GPU)" section to tutorial.rst

    Code Changes

    • core.py
      • added params: get_audio_embedding(frontend='auto'), process_audio_file(frontend='auto'), process_video_file(audio_frontend='auto')
      • Added function preprocess_audio(y, sr, input_repr=None) that encapsulates the librosa frontend (as well as preprocessing for the kapre frontend)
        • for librosa, you pass the input_repr and for kapre inputs, you leave input_repr=None
    • cli.py - added cli flag (--audio-frontend)
    • models.py
      • added param load_audio_embedding_model(frontend='kapre')
      • using new kapre composite layer helpers get_stft_magnitude_layer
      • disabled latest mag2db code and patched in the legacy version (kapre_v0_1_4_magnitude_to_decibel)
      • kapre is now technically an optional dependency (will only try to import if we try to load a model with kapre frontend)
        • we still install it with setup.py, but if someone wanted to, they could install everything manually without kapre and openl3 should still work for the librosa frontend

    Test Changes

    • we now have separate regression data for kapre/librosa
    • added tests for frontend model following the existing model tests
    • converted some tests to use pytest.mark.parameterize to avoid doubling the length of the tests for testing frontends

    Dev Util Changes

    • added tests/generate_regression.py which generates new regression data
    • added tests/package_weights.py which takes the weights files in the openl3 package folder and gzips them for git push
    • added tests/migration/remove_layers.py which lets us strip out the spectrogram (or any other) layers
    • tests/migration/ has a few other analysis things/notebooks that were used early on in the frontend testing

    Before merging:

    • double check dependency versions
    • are the pinned versions still valid? might need some help with this one
    • Change models download url in setup.py to main repo (currently it's pointing at my fork so I could test with travis)
    • should we integrate changes from https://github.com/marl/openl3/pull/55?
    • should we run the classifier comparison one more time right before merging as a safety check? idk
    opened by beasteers 4
  • Add batch processing functionality

    Add batch processing functionality

    Adds batch processing functionality to all embedding computation functions and file processing functions, allowing for one or more inputs to be processed. When possible, multiple inputs are put in the same input batch to the network for inference.

    opened by auroracramer 4
  • Add image embedding API

    Add image embedding API

    Adds image embedding API, including functions for processing both images and videos in addition to audio files. Additionally changes the CLI to account for different modalities of inputs (i.e. audio, image, or video).

    opened by auroracramer 4
  • API reference in documentation missing

    API reference in documentation missing

    When going to https://openl3.readthedocs.io/en/latest/api.html I only see the headers

    Core functionality
    Models functionality
    

    with nothing under each header. Expected would be a list of classes and functions and the associated documentation. At least those APIs that are mentioned in the tutorial.

    opened by jonnor 4
  • Clarification on input representation

    Clarification on input representation

    I was just reading through the source code on openl3 > core.py and noticed something in functions (1. _librosa_linear_frontend and 2. _librosa_mel_frontend). It seems librosa.power_to_db() is being used on a magnitude, not power spectrum? Instead should it be using librosa.amplitude_to_db()?

    opened by alisonbma 0
  • Example of fine-tuning the audio sub-network.

    Example of fine-tuning the audio sub-network.

    I want to perform the fine-tuning of the audio subnetwork to fit my audio classification problem. To this aim, I plan to use the _construct_linear_audio_network, _construct_mel128_audio_network, and _construct_mel256_audio_network functions to load the pre-trained Keras model and then append one or more fully-connected layers to perform the classification.

    However, I don't understand the Input shape of such models. According to the models.py, the input shape is input_shape = (1, asr * audio_window_dur), where asr= 48000 and audio_window_dur=1; what's asr and why it has that value? Can you please provide an example of using the Keras model from the .wav file?

    I really appreciate any help you can provide.

    opened by mattiacampana 0
  • Extract activation from lower audio layers

    Extract activation from lower audio layers

    Hi, I was wondering how I can extract activations from the lower audio layers. I guess "embeddings" are the same as "MaxPool_3"? and if that's correct, then "MaxPool", "MaxPool_1", and "MaxPool_2" corresponds to the first, second, and third max-pooling layers in the Audio ConvNet as explained in Arandjelovic and Zisserman 2018 (https://arxiv.org/abs/1712.06651)?

    opened by seunggookim 1
  • m1 macos installation problem

    m1 macos installation problem

    Hi, I am using an m1 macbook, and when I try to install openl3, I encounter the problem when the script tries to install h5py, but I have installed h5py in my virtual environment. The problem is as below: building 'h5py.defs' extension creating build/temp.macosx-11.0-arm64-3.8 creating build/temp.macosx-11.0-arm64-3.8/private creating build/temp.macosx-11.0-arm64-3.8/private/var creating build/temp.macosx-11.0-arm64-3.8/private/var/folders creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308 creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include -arch arm64 -fPIC -O2 -isystem /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include -arch arm64 -DH5_USE_16_API -I./h5py -I/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/lzf -I/opt/local/include -I/usr/local/include -I/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include -I/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/python3.8 -c /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c -o build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.o In file included from /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:731: In file included from ./h5py/api_compat.h:26: In file included from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4: In file included from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12: In file included from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1969: /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it with "
    ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:16556:56: error: too few arguments to function call, expected 3, have 2 __pyx_t_1 = H5Oget_info(__pyx_v_loc_id, __pyx_v_oinfo); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1509, __pyx_L1_error) ~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:497:15: note: 'H5Oget_info3' declared here H5_DLL herr_t H5Oget_info3(hid_t loc_id, H5O_info2_t *oinfo, unsigned fields); ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:16671:95: error: too few arguments to function call, expected 5, have 4 __pyx_t_1 = H5Oget_info_by_name(__pyx_v_loc_id, __pyx_v_name, __pyx_v_oinfo, __pyx_v_lapl_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1520, __pyx_L1_error) ~~~~~~~~~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:557:15: note: 'H5Oget_info_by_name3' declared here H5_DLL herr_t H5Oget_info_by_name3(hid_t loc_id, const char *name, H5O_info2_t *oinfo, unsigned fields, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:16786:144: error: too few arguments to function call, expected 8, have 7 __pyx_t_1 = H5Oget_info_by_idx(__pyx_v_loc_id, __pyx_v_group_name, __pyx_v_idx_type, __pyx_v_order, __pyx_v_n, __pyx_v_oinfo, __pyx_v_lapl_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1531, __pyx_L1_error) ~~~~~~~~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:631:15: note: 'H5Oget_info_by_idx3' declared here H5_DLL herr_t H5Oget_info_by_idx3(hid_t loc_id, const char *group_name, H5_index_t idx_type, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:17821:100: error: too few arguments to function call, expected 6, have 5 __pyx_t_1 = H5Ovisit(__pyx_v_obj_id, __pyx_v_idx_type, __pyx_v_order, __pyx_v_op, __pyx_v_op_data); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1630, __pyx_L1_error) ~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:1326:15: note: 'H5Ovisit3' declared here H5_DLL herr_t H5Ovisit3(hid_t obj_id, H5_index_t idx_type, H5_iter_order_t order, H5O_iterate2_t op, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:17936:143: error: too few arguments to function call, expected 8, have 7 __pyx_t_1 = H5Ovisit_by_name(__pyx_v_loc_id, __pyx_v_obj_name, __pyx_v_idx_type, __pyx_v_order, __pyx_v_op, __pyx_v_op_data, __pyx_v_lapl_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1641, __pyx_L1_error) ~~~~~~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:1492:15: note: 'H5Ovisit_by_name3' declared here H5_DLL herr_t H5Ovisit_by_name3(hid_t loc_id, const char *obj_name, H5_index_t idx_type, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:21846:13: warning: assigning to 'void *' from 'const void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] __pyx_t_1 = H5Pget_driver_info(__pyx_v_plist_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2016, __pyx_L1_error) ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:34606:68: error: too few arguments to function call, expected 4, have 3 __pyx_t_1 = H5Sencode(__pyx_v_obj_id, __pyx_v_buf, __pyx_v_nalloc); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 3303, __pyx_L1_error) ~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Spublic.h:373:15: note: 'H5Sencode2' declared here H5_DLL herr_t H5Sencode2(hid_t obj_id, void *buf, size_t *nalloc, hid_t fapl); ^ 2 warnings and 6 errors generated. error: command '/usr/bin/clang' failed with exit code 1 [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip. WARNING: No metadata found in /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages Rolling back uninstall of h5py Moving to /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/h5py-3.6.0.dist-info/ from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/~5py-3.6.0.dist-info Moving to /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/h5py/ from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/~5py error: legacy-install-failure

    × Encountered error while trying to install package. ╰─> h5py

    note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.

    My python version: 3.8.13

    By the way, I have tried to build from source, but this problem of h5py still exists

    opened by yy945635407 3
Releases(v0.4.1)
  • v0.4.1(Aug 6, 2021)

    Release version 0.4.1 of OpenL3.

    • Add librosa as an explicit dependency
    • Remove upper limit pinning for scikit-image dependency
    • Fix version number typo in README
    • Update TensorFlow information in README
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Aug 6, 2021)

    Release version 0.4.0 of OpenL3.

    • Upgraded to tensorflow>=2.0.0. Tensorflow is now included as a dependency because of dual CPU-GPU support.
    • Upgraded to kapre>=0.3.5. Reverted magnitude scaling method to match kapre<=0.1.4 as that's what the model was trained on.
    • Removed Python 2/3.5 support as they are not supported by Tensorflow 2 (and added 3.7 & 3.8)
    • Add librosa frontend, and allow frontend to be configurable between kapre and librosa
      • Added frontend='kapre' parameter to get_audio_embedding, process_audio_file, and load_audio_embedding_model
      • Added audio_frontend='kapre' parameter to process_video_file and the CLI
      • Added frontend='librosa' flag to load_audio_embedding_model for use with a librosa or other external frontend
      • Added a openl3.preprocess_audio function that computes the input features needed for each frontend
    • Model .h5 no longer have Kapre layers in them and are all importable from tf.keras
    • Made skimage and moviepy.video.io.VideoFileClip import VideoFileClip use lazy imports
    • Added new regression data for both Kapre 0.3.5 and Librosa
    • Parameterized some of the tests to reduce duplication
    • Added developer helpers for regression data, weight packaging, and .h5 file manipulation
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0rc2(May 30, 2021)

  • v0.4.0rc1(May 30, 2021)

  • v0.4.0rc0(May 30, 2021)

  • v0.3.1(Feb 28, 2020)

    Release version 0.3.0 of OpenL3.

    • Require keras>=2.0.9,<2.3.0 in dependencies to avoid force installation of TF 2.x during pip installation.
    • Update README and installation docs to explicitly state that we do not yet support TF 2.x and to offer a working dependency combination.
    • Require kapre==0.1.4 in dependencies to avoid installing tensorflow>=1.14 which break regression tests.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1rc0(Feb 28, 2020)

    Release candidate 0 of version 0.3.1.

    • Require keras>=2.0.9,<2.3.0 in dependencies to avoid force installation of TF 2.x during pip installation.
    • Update README and installation docs to explicitly state that we do not yet support TF 2.x and to offer a working dependency combination.
    • Require kapre==0.1.4 in dependencies to avoid installing tensorflow>=1.14 which break regression tests.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Jan 23, 2020)

    Release version 0.3.0 of OpenL3.

    • Rename audio related embedding functions to indicate that they are specific to audio.
    • Add image embedding functionality to API and CLI.
    • Add video processing functionality to API and CLI.
    • Add batch processing functionality to API and CLI to more efficiently process multiple inputs.
    • Update documentation with new functionality.
    • Address build issues with updated dependencies.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0rc0(Jan 23, 2020)

    Release candidate 0 of version 0.3.0.

    • Rename audio related embedding functions to indicate that they are specific to audio.
    • Add image embedding functionality to API and CLI.
    • Add video processing functionality to API and CLI.
    • Add batch processing functionality to API and CLI to more efficiently process multiple inputs.
    • Update documentation with new functionality.
    • Address build issues with updated dependencies.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Apr 18, 2019)

    Release version 0.2.0 of OpenL3.

    • Update embedding models with ones that have been trained with the kapre bug fixed.
    • Allow loaded models to be passed in and used in process_file and get_embedding.
    • Rename get_embedding_model to load_embedding_model.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0rc0(Apr 13, 2019)

    Release candidate 0 of version 0.2.0

    • Update embedding models with ones that have been trained with the kapre bug fixed.
    • Allow loaded models to be passed in and used in process_file and get_embedding.
    • Rename get_embedding_model to load_embedding_model.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Mar 7, 2019)

    Release of v0.1.1 of OpenL3.

    Update kapre to fix issue with dynamic range normalization for decibel computation when computing spectrograms.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1rc1(Mar 6, 2019)

  • v0.1.1rc0(Feb 21, 2019)

    Release candidate 0 of version 0.1.1

    Update kapre to fix issue with dynamic range normalization for decibel computation when computing spectrograms.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Nov 22, 2018)

  • v0.1.0rc6(Nov 20, 2018)

  • v0.1.0rc5(Nov 20, 2018)

  • v0.1.0rc4(Nov 20, 2018)

    Release candidate 4 of version 0.1.0

    This release also updates the PyPI keywords, and moves the model files directly into the module directory (instead of creating a subdirectory) to make the pip installation process easier when installing with PyPI.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0rc3(Nov 20, 2018)

  • v0.1.0rc2(Nov 20, 2018)

  • v0.1.0rc1(Nov 20, 2018)

  • v0.1.0rc0(Nov 20, 2018)

Owner
Music and Audio Research Laboratory - NYU
Music and Audio Research Laboratory - NYU
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

NTT Communication Science Laboratories 160 Jan 4, 2023
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

null 458 Jan 2, 2023
Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution Figure: Example visualization of the method and baseline as a

Oliver Hahn 16 Dec 23, 2022
Keras Image Embeddings using Contrastive Loss

Keras-Image-Embeddings-using-Contrastive-Loss Image to Embedding projection in vector space. Implementation in keras and tensorflow for custom data. B

Shravan Anand K 5 Mar 21, 2022
Keras Image Embeddings using Contrastive Loss

Image to Embedding projection in vector space. Implementation in keras and tensorflow of batch all triplet loss for one-shot/few-shot learning.

Shravan Anand K 5 Mar 21, 2022
Python package to generate image embeddings with CLIP without PyTorch/TensorFlow

imgbeddings A Python package to generate embedding vectors from images, using OpenAI's robust CLIP model via Hugging Face transformers. These image em

Max Woolf 81 Jan 4, 2023
Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

null 139 Jan 1, 2023
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

null 419 Jan 3, 2023
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

timeseriesAI 2.8k Jan 8, 2023
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

BUPT GAMMA Lab 519 Jan 2, 2023
Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

DEFT: Detection Embeddings for Tracking DEFT: Detection Embeddings for Tracking, Mohamed Chaabane, Peter Zhang, J. Ross Beveridge, Stephen O'Hara

Mohamed Chaabane 253 Dec 18, 2022
Learning embeddings for classification, retrieval and ranking.

StarSpace StarSpace is a general-purpose neural model for efficient learning of entity embeddings for solving a wide variety of problems: Learning wor

Facebook Research 3.8k Dec 22, 2022
Improving XGBoost survival analysis with embeddings and debiased estimators

xgbse: XGBoost Survival Embeddings "There are two cultures in the use of statistical modeling to reach conclusions from data

Loft 242 Dec 30, 2022
Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

PyGAS: Auto-Scaling GNNs in PyG PyGAS is the practical realization of our G NN A uto S cale (GAS) framework, which scales arbitrary message-passing GN

Matthias Fey 139 Dec 25, 2022
🤖 A Python library for learning and evaluating knowledge graph embeddings

PyKEEN PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-m

PyKEEN 1.1k Jan 9, 2023
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 29, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17k Feb 11, 2021
LightLog is an open source deep learning based lightweight log analysis tool for log anomaly detection.

LightLog Introduction LightLog is an open source deep learning based lightweight log analysis tool for log anomaly detection. Function description [BG

null 25 Dec 17, 2022