A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Spotify

Last update: Jan 1, 2023

Related tags

Audio music lightweight machine-learning midi transcription pitch-detection polyphonic

Overview

Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence Lab. It's small, easy-to-use, and pip install-able.

Basic Pitch may be simple, but it's is far from "basic"! basic-pitch is efficient and easy to use, and its multipitch support, its ability to generalize across instruments, and its note accuracy competes with much larger and more resource-hungry AMT systems.

Provide a compatible audio file and basic-pitch will generate a MIDI file, complete with pitch bends. Basic pitch is instrument-agnostic and supports polyphonic instruments, so you can freely enjoy transcription of all your favorite music, no matter what instrument is used. Basic pitch works best on one instrument at a time.

Research Paper

This library was released in conjunction with Spotify's publication at ICASSP 2022. You can read more about this research in the paper, A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation.

If you use this library in academic research, consider citing it:

@inproceedings{2022_BittnerBRME_LightweightNoteTranscription_ICASSP,
  author= {Bittner, Rachel M. and Bosch, Juan Jos\'e and Rubinstein, David and Meseguer-Brocal, Gabriel and Ewert, Sebastian},
  title= {A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation},
  booktitle= {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  address= {Singapore},
  year= 2022,
}

Demo

If, for whatever reason, you're not yet completely inspired, or you're just like so totally over the general vibe and stuff, checkout our snappy demo website, basicpitch.io, to experiment with our model on whatever music audio you provide!

Installation

basic-pitch is available via PyPI. To install the current release:

pip install basic-pitch

To update Basic Pitch to the latest version, add --upgrade to the above command.

Compatible Environments:

MacOS, Windows and Ubuntu operating systems
Python versions 3.7, 3.8, 3.9

Usage

Model Prediction

Command Line Tool

This library offers a command line tool interface. A basic prediction command will generate and save a MIDI file transcription of audio at the <input-audio-path> to the <output-directory>:

basic-pitch <output-directory> <input-audio-path>

To process more than one audio file at a time:

basic-pitch <output-directory> <input-audio-path-1> <input-audio-path-2> <input-audio-path-3>

Optionally, you may append any of the following flags to your prediction command to save additional formats of the prediction output to the <output-directory>:

--sonify-midi to additionally save a .wav audio rendering of the MIDI file
--save-model-outputs to additionally save raw model outputs as an NPZ file
--save-note-events to additionally save the predicted note events as a CSV file

To discover more parameter control, run:

basic-pitch --help

Programmatic

predict()

Import basic-pitch into your own Python code and run the predict functions directly, providing an <input-audio-path> and returning the model's prediction results:

from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH

model_output, midi_data, note_activations = predict(<input-audio-path>)

<minimum-frequency> & <maximum-frequency> (floats) set the maximum and minimum allowed note frequency, in Hz, returned by the model. Pitch events with frequencies outside of this range will be excluded from the prediction results.
model_output is the raw model inference output
midi_data is the transcribed MIDI data derived from the model_output
note_events is a list of note events derived from the model_output

predict() in a loop

To run prediction within a loop, you'll want to load the model yourself and provide predict() with the loaded model object itself to be used for repeated prediction calls, in order to avoid redundant and sluggish model loading.

import tensorflow as tf

from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH

basic_pitch_model = tf.saved_model.load(str(ICASSP_2022_MODEL_PATH))

for x in range():
    ...
    model_output, midi_data, note_activations = predict(
        <loop-x-input-audio-path>,
        basic_pitch_model,
    )
    ...

predict_and_save()

If you would like basic-pitch orchestrate the generation and saving of our various supported output file types, you may use predict_and_save instead of using predict directly:

from basic_pitch.inference import predict_and_save

predict_and_save(
    <input-audio-path-list>,
    <output-directory>,
    <save-midi>,
    <sonify-midi>,
    <save-model-outputs>,
    <save-note-events>,
)

where:

<input-audio-path-list> & <output-directory>
- directory paths for basic-pitch to read from/write to.
<save-midi>
- bool to control generating and saving a MIDI file to the <output-directory>
<sonify-midi>
- bool to control saving a WAV audio rendering of the MIDI file to the <output-directory>
<save-model-outputs>
- bool to control saving the raw model output as a NPZ file to the <output-directory>
<save-note-events>
- bool to control saving predicted note events as a CSV file <output-directory>

Model Input

Supported Audio Codecs

basic-pitch accepts all sound files that are compatible with its version of librosa, including:

.mp3
.ogg
.wav
.flac
.m4a

Mono Channel Audio Only

While you may use stereo audio as an input to our model, at prediction time, the channels of the input will be down-mixed to mono, and then analyzed and transcribed.

File Size/Audio Length

This model can process any size or length of audio, but processing of larger/longer audio files could be limited by your machine's available disk space. To process these files, we recommend streaming the audio of the file, processing windows of audio at a time.

Sample Rate

Input audio maybe be of any sample rate, however, all audio will be resampled to 22050 Hz before processing.

Contributing

Contributions to basic-pitch are welcomed! See CONTRIBUTING.md for details.

Copyright and License

This software is licensed under the Apache License, Version 2.0 (the "Apache License"). You may choose either license to govern your use of this software only upon the condition that you accept all of the terms of either the Apache License.

You may obtain a copy of the Apache License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the Apache License or the GPL License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Apache License for the specific language governing permissions and limitations under the Apache License.

Comments

Output of one Piano cover is a MIDI with 62 instruments in MuseScore

Hi, I input a cover of a piano song into the Python library, and opened the resulting MIDI file in MuseScore. This was the result: Essentially, there were a few notes played by 62 different pianos, despite the original performance being played on 1.

Is this expected behaviour from basic-pitch? ~~Is it because I don't have the CUDA dependencies?~~ Installed CUDA and cuDNN.

If it is expected behaviour, I'm looking for some angles to attack this problem from on the MuseScore side and the Basic-Pitch side. Any help would be appreciated.

opened by kevinlinxc 11
Windows 10, Python 3.8, ModuleNotFoundError: No module named 'basic_pitch.predict'

I installed basic-pitch ok but when I run it I get:

PS C:\Users\y\Desktop> basic-pitch.exe .\Lifeblood '..\Downloads\That Petrol Emotion - Lifeblood -1986.mp3' Traceback (most recent call last): File "C:\Users\y\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\y\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\y\AppData\Local\Programs\Python\Python38\Scripts\basic-pitch.exe_main.py", line 4, in ModuleNotFoundError: No module named 'basic_pitch.predict'

opened by mdwhitis 8
basic-pitch not working with most of the input audio files

I tried basic pitch with 4 input audio files in mp3 format, only 1 of them gave back an ouput, the others showed me the same message as below (Aborted). I then converted one of the files from .mp3 to .wav and tried and it still failed. Is there any limitations on the length or format of the file?

opened by VijayIyer 6

Unable to install basic-pitch on MacOS 12 and 13

It fails when I install basic-pitch on MacOS 12 and 13. I am on an Apple M1 Pro. I've attempted Python 3.8, 3.9, and 3.10, but they all fail. This issue also happened before I upgraded to macOS 13.

╭─    ~ ··················································································································································································· 1 ✘  at 15:04:33  ─╮
╰─ python -V                                                                                                                                                                                                       ─╯
Python 3.8.14

╭─    ~ ····················································································································································································· ✔  at 15:04:49  ─╮
╰─ pip -V                                                                                                                                                                                                          ─╯
pip 22.3 from /Users/ndiyabongasekhomba/.pyenv/versions/3.8.14/lib/python3.8/site-packages/pip (python 3.8)

╭─    ~ ····················································································································································································· ✔  at 15:04:56  ─╮
╰─
╭─    ~ ·········································································································································································· ✔  took 3s   at 15:09:15  ─╮
╰─ pip install basic-pitch --upgrade --no-cache-dir                                                                                                                                                                ─╯
Collecting basic-pitch
  Downloading basic-pitch-0.2.0.tar.gz (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 2.2 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting librosa>=0.8.0
  Downloading librosa-0.9.2-py3-none-any.whl (214 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 214.3/214.3 kB 31.6 MB/s eta 0:00:00
Collecting mir_eval>=0.6
  Downloading mir_eval-0.7.tar.gz (90 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 90.7/90.7 kB 30.5 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting pretty_midi>=0.2.9
  Downloading pretty_midi-0.2.9.tar.gz (5.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.6/5.6 MB 16.1 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting resampy>=0.2.2
  Downloading resampy-0.4.2-py3-none-any.whl (3.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 52.9 MB/s eta 0:00:00
Collecting scipy>=1.4.1
  Downloading scipy-1.9.3-cp38-cp38-macosx_12_0_arm64.whl (28.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 28.5/28.5 MB 25.9 MB/s eta 0:00:00
Collecting basic-pitch
  Downloading basic_pitch-0.1.0-py2.py3-none-any.whl (373 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 373.6/373.6 kB 45.2 MB/s eta 0:00:00
Collecting typing-extensions
  Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting numpy<1.20,>=1.19.2
  Downloading numpy-1.19.5.zip (7.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.3/7.3 MB 30.0 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting basic-pitch
  Downloading basic-pitch-0.0.1.tar.gz (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 42.5 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
ERROR: Cannot install basic-pitch==0.0.1, basic-pitch==0.1.0 and basic-pitch==0.2.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    basic-pitch 0.2.0 depends on tensorflow<2.10 and >=2.4.1
    basic-pitch 0.1.0 depends on tensorflow<2.7.0 and >=2.4.1
    basic-pitch 0.0.1 depends on tensorflow<2.7.0 and >=2.4.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

╭─    ~ ······································································································································································· 1 ✘  took 26s   at 15:09:56  ─╮
╰─                                                                                                                                                                                                                 ─╯

╭─    ~ ······································································································································································· 1 ✘  took 26s   at 15:09:56  ─╮
╰─ pip list installed                                                                                                                                                                                              ─╯
Package    Version
---------- -------
pip        22.3
setuptools 56.0.0

╭─    ~ ····················································································································································································· ✔  at 15:17:10  ─╮
╰─                                                                                                                                                                                                                 ─╯

╭─    ~ ····················································································································································································· ✔  at 15:17:10  ─╮
╰─ uname -a                                                                                                                                                                                                        ─╯
Darwin NdiyaboombasMBP 22.1.0 Darwin Kernel Version 22.1.0: Sun Oct  9 20:15:09 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T6000 arm64

╭─    ~ ····················································································································································································· ✔  at 15:18:21  ─╮
╰─                                                                                                                                                                                                                 ─╯

╭─    ~ ····················································································································································································· ✔  at 15:18:21  ─╮
╰─ system_profiler SPSoftwareDataType SPHardwareDataType                                                                                                                                                           ─╯
Software:

    System Software Overview:

      System Version: macOS 13.0 (22A380)
      Kernel Version: Darwin 22.1.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Computer Name: Ndiyabonga Sekhomba's MacBook Pro
      User Name: Ndiyabonga Sekhomba (ndiyabongasekhomba)
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 15 hours, 58 minutes

Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro18,1
      Model Number: MK183ZE/A
      Chip: Apple M1 Pro
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 16 GB
      System Firmware Version: 8419.41.10
      OS Loader Version: 8419.41.10
      Serial Number (system): JKD7YDJX1N
      Hardware UUID: 133AD749-6093-50EB-81C7-D06792BDA20D
      Provisioning UDID: 00006000-001C095E3423801E
      Activation Lock Status: Enabled


╭─    ~ ····················································································································································································· ✔  at 15:20:00  ─╮
╰─                                                                                                                                                                                                                 ─╯

opened by xndiyabongas 4

Keras.io tutorial?

Hey everyone! I’m Luke - I work full time on keras. I think you have a really great core base here for audio processing. There aren’t a ton of code samples that use Keras for audio processing right now.

would you guys be interested in contributing a keras.io tutorial showcasing some of your work?

opened by LukeWood 4
Could it handle realtime transcription?

This is more of a question than an issue, but could basic-pitch handle realtime transcription of audio (with low latency)?

It seems that it expects input in chunks of two seconds if I understand the code correctly, which suggests that you'd always have at least that much latency, but I could be wrong.

opened by tomduncalf 3
Issues installing basic pitch on Pyton 3.10 with tensorflow 2.9.1

Hello,

I would like to install basic-picth on a Macbook pro running Monterey, with Python 3.10, pip3 and tensorflow 2.9.1, but the tensorflow version supported are only between 2.4.1 and <2.7.0 Is there anything I can do?

Thanks, I can't wait to experiment with it! Kindest, Dom

opened by domvicinanza 2

AttributeError: module 'tensorflow.python.training.experimental.mixed_precision' has no attribute '_register_wrapper_optimizer_cls'

C:\Users\MohammedMehdiTBER>basic-pitch "C:\Users\MohammedMehdiTBER\Documents" "C:\Users\MohammedMehdiTBER\Downloads\mytrack.flac" --save-midi --multiple-pitch-bends

✨✨✨✨✨✨✨✨✨
✨ Basic Pitch  ✨
✨✨✨✨✨✨✨✨✨
Traceback (most recent call last):
  File "C:\Program Files\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\basic-pitch.exe\__main__.py", line 7, in <module>
  File "C:\Program Files\Python39\lib\site-packages\basic_pitch\predict.py", line 107, in main
    from basic_pitch.inference import predict_and_save, verify_output_dir, verify_input_path
  File "C:\Program Files\Python39\lib\site-packages\basic_pitch\inference.py", line 125, in <module>
    audio_path: Union[pathlib.Path, str], model: keras.Model, debug_file: Optional[pathlib.Path] = None
  File "C:\Program Files\Python39\lib\site-packages\tensorflow\python\util\lazy_loader.py", line 62, in __getattr__
    module = self._load()
  File "C:\Program Files\Python39\lib\site-packages\tensorflow\python\util\lazy_loader.py", line 45, in _load
    module = importlib.import_module(self.__name__)
  File "C:\Program Files\Python39\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "C:\Users\MohammedMehdiTBER\AppData\Roaming\Python\Python39\site-packages\keras\__init__.py", line 25, in <module>
    from keras import models
  File "C:\Users\MohammedMehdiTBER\AppData\Roaming\Python\Python39\site-packages\keras\models.py", line 20, in <module>
    from keras import metrics as metrics_module
  File "C:\Users\MohammedMehdiTBER\AppData\Roaming\Python\Python39\site-packages\keras\metrics.py", line 27, in <module>
    from keras import activations
  File "C:\Users\MohammedMehdiTBER\AppData\Roaming\Python\Python39\site-packages\keras\activations.py", line 20, in <module>
    from keras.layers import advanced_activations
  File "C:\Users\MohammedMehdiTBER\AppData\Roaming\Python\Python39\site-packages\keras\layers\__init__.py", line 24, in <module>
    from keras.engine.input_layer import Input
  File "C:\Users\MohammedMehdiTBER\AppData\Roaming\Python\Python39\site-packages\keras\engine\input_layer.py", line 21, in <module>
    from keras.engine import base_layer
  File "C:\Users\MohammedMehdiTBER\AppData\Roaming\Python\Python39\site-packages\keras\engine\base_layer.py", line 41, in <module>
    from keras.mixed_precision import loss_scale_optimizer
  File "C:\Users\MohammedMehdiTBER\AppData\Roaming\Python\Python39\site-packages\keras\mixed_precision\loss_scale_optimizer.py", line 1180, in <module>
    mixed_precision._register_wrapper_optimizer_cls(optimizer_v2.OptimizerV2,
AttributeError: module 'tensorflow.python.training.experimental.mixed_precision' has no attribute '_register_wrapper_optimizer_cls'

opened by MohammedMehdiTBER 2

Web version parameters vs function arguments

How do the web app parameters Note Segmentation and Model Confidence Threshold work? Is it possible to map their behavior using predict_and_save function arguments?

opened by neo-anderson 2
Open sourcing JavaScript version

Hey,

Really cool library, pretty impressive results.

I wondered if you had any plans to open source the JS version, or provide instructions on how to use the model with Tensorflow.js?

Thanks, Tom

opened by tomduncalf 2
Difference in pitch bend behavior

If you playback the output MIDI in VLC player and compare it with the rendered audio, the pitch bends sound different. The sonified audio on the webapp and the corresponding MIDI sound different from the output of the pip installed version.

opened by neo-anderson 2
General code cleanup and many minor fixes

When using basic-pitch I encountered a few cases of dead code, redundant calculations, and unclear try-except blocks.

This PR is a bunch of many single-line changes. I hope my attempt to fix some of these issues is welcome.

opened by gitpushoriginmaster 0
Add midi tempo CLI argument
Why

Resolves issue #40 requesting a CLI option of specifying the tempo of the created MIDI file.

This is my first contribution! Unsure if:

I should add some tests

The midi tempo is set in the correct way

Would happily get some input :)

What is changing

You are now able to use an option in the CLI to specify the tempo of the created midi file

Example

basic-pitch <output-directory> <input-audio-path> --midi-tempo 75
opened by LukasGardberg 0
[feature request] get f0 pitch as 1D array instead of MIDI notes

It will be cool to get pitch output as f0 1d array (with corresponding time array) instead of midi notes with pitchbends. I want to compare basic-pitch model f0 estimations with models like crepe, pyin, spice and other

opened by tandav 0
Human voice pitch detection

I haven't seen any notion that human voice pitch detection is covered or not but since the claim is to be instrument agnostic I thought it is better to give you a repro. If this is out of context please close and sorry for the inconvenience.

I used the online version at https://basicpitch.spotify.com/ and fed it with a clean noiseless voice from https://samplefocus.com/samples/solo-voice-aah-solo. the result is far from being usable:

Is there a plan to support fundamental frequency detection?
question

opened by kasravi 2