GiantMIDI-Piano is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers

Bytedance Inc.

Last update: Jan 4, 2023

Related tags

Audio GiantMIDI-Piano

Overview

GiantMIDI-Piano

GiantMIDI-Piano [1] is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers. The curated subset by constraining composer surnames contains 7,236 MIDI files of 1,787 composers. GiantMIDI-Piano are transcribed from live recordings with a high-resolution piano transcription system [2].

Here is the demo of GiantMIDI-Piano: https://www.youtube.com/watch?v=5U-WL0QvKCg

Transcribed MIDI files of GiantMIDI-Piano can be viewed at midis_preview directory.

Download GiantMIDI-Piano

Method 1 (suggested)

Follow disclaimer.md to agree a disclaimer and download a stable version of GiantMIDI-Piano (193 MB).

Method 2

Users can acquire GiantMIDI-Piano by downloading all audio recordings, and transcribing them into MIDI files following the rest part of this repo. The transcription takes ~200 hours on a single GPU card.

Install requirements

Install PyTorch (>=1.4) following https://pytorch.org/.

The above links also include a curated subset. The curated subset constrain YouTube titles must contain composer surnames.

pip install -r requirements.txt

Download audio recordings

Download audio recordings from YouTube using the following scripts. Approximately 10,854 audio recordings can be downloaded. There can be audios no longer downloadable.

WORKSPACE="./workspace"
mkdir -p $WORKSPACE
cp "resources/full_music_pieces_youtube_similarity_pianosoloprob.csv" $WORKSPACE/"full_music_pieces_youtube_similarity_pianosoloprob.csv"

# Download all mp3s. Users could split the downloading into parts to speed up the downloading. E.g.,
python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=0 --end_index=30000
python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=30000 --end_index=60000
python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=60000 --end_index=90000
python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=90000 --end_index=120000
python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=12000 --end_index=150000

The downloaded mp3 files look like:

mp3s_piano_solo (10,854 files)
├── Aaron, Michael, Piano Course, V8WvKK-1b2c.mp3
├── Aarons, Alfred E., Brother Bill, Giet2Krl6Ww.mp3
└── ...

Transcribe audios to MIDI files

# Transcribe all mp3s to midi files. Users could split the transcription into parts to speed up the transcription. E.g.,
python3 audios_to_midis.py transcribe_piano --workspace=$WORKSPACE --mp3s_dir=$WORKSPACE"/mp3s_piano_solo" --midis_dir=$WORKSPACE"/midis" --begin_ind=0 --end_index=30000
python3 audios_to_midis.py transcribe_piano --workspace=$WORKSPACE --mp3s_dir=$WORKSPACE"/mp3s_piano_solo" --midis_dir=$WORKSPACE"/midis" --begin_ind=30000 --end_index=60000
python3 audios_to_midis.py transcribe_piano --workspace=$WORKSPACE --mp3s_dir=$WORKSPACE"/mp3s_piano_solo" --midis_dir=$WORKSPACE"/midis" --begin_ind=60000 --end_index=90000
python3 audios_to_midis.py transcribe_piano --workspace=$WORKSPACE --mp3s_dir=$WORKSPACE"/mp3s_piano_solo" --midis_dir=$WORKSPACE"/midis" --begin_ind=90000 --end_index=120000
python3 audios_to_midis.py transcribe_piano --workspace=$WORKSPACE --mp3s_dir=$WORKSPACE"/mp3s_piano_solo" --midis_dir=$WORKSPACE"/midis" --begin_ind=120000 --end_index=150000

The transcribed MIDI files look like:

midis (10,854 files)
├── Aaron, Michael, Piano Course, V8WvKK-1b2c.mid
├── Abel, Frederic, Lola Polka, SLNJF0uiqRw.mid
└── ...

The transcription of all audio recordings may take around 10 days on a single GPU card.

Details of scripts can be viewed at scripts

Analyses the statistics of GiantMIDI-Piano

All statistics and figures in [1] can be reproduced by:

./scripts/3_statistics.sh

FAQ

If users met "Too many requests! Sleep for 3600 s" when downloading, it means that YouTube has limited the number of videos for downloading. Users could either 1) Wait until YouTube unblock your IP (1 days or a few weeks), or 2) try to use another machine with a different IP for downloading.

Contact

Qiuqiang Kong, [email protected]

Cite

[1] Qiuqiang Kong, Bochen Li, Jitong Chen, and Yuxuan Wang. "GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music." arXiv preprint arXiv:2010.07061 (2020). https://arxiv.org/pdf/2010.07061

License

CC BY 4.0

You might also like...

Convert complex chord names to midi notes

ezchord Simple python script that can convert complex chord names to midi notes Prerequisites pip install midiutil Usage ./ezchord.py Dmin7 G7 C timi

2 Dec 20, 2022

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling Demos | Blog Post | Colab Notebook | Paper | MIDI-DDSP is a hierarchical

239 Jan 3, 2023

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence La

1.4k Jan 1, 2023

Multi-Track Music Generation with the Transfomer and the Johann Sebastian Bach Chorales dataset

MMM: Exploring Conditional Multi-Track Music Generation with the Transformer and the Johann Sebastian Bach Chorales Dataset. Implementation of the pap

102 Dec 8, 2022

eyeD3 is a Python module and command line program for processing ID3 tags. Information about mp3 files (i.e bit rate, sample frequency, play time, etc.) is also provided. The formats supported are ID3v1 (1.0/1.1) and ID3v2 (2.3/2.4).

Status About eyeD3 is a Python tool for working with audio files, specifically MP3 files containing ID3 metadata (i.e. song info). It provides a comma

425 Jan 1, 2023

Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag

577 Dec 26, 2022

Python I/O for STEM audio files

stempeg = stems + ffmpeg Python package to read and write STEM audio files. Technically, stems are audio containers that combine multiple audio stream

72 Dec 23, 2022

Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag

435 Feb 17, 2021

This Bot can extract audios and subtitles from video files

Send any valid video file and the bot shows you available streams in it that can be extracted!!

56 Nov 22, 2022

Comments

Error when processing stereo audio

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-6634fd8f8eb4> in <module>
----> 1 dict = tran.transcribe(audio, "Ascended Vibrations.mid")

A:\***\piano_transcription_inference\inference.py in transcribe(self, audio, midi_path)
     80             * self.segment_samples - audio_len
     81
---> 82         audio = np.concatenate((audio, np.zeros((1, pad_len))), axis=1)
     83
     84         # Enframe to segments

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 3 dimension(s) and the array at index 1 has 2 dimension(s)

opened by Creepercdn 0

updated mp3 download and audio2midi scripts to ignore missing files

As discussed in Issue https://github.com/bytedance/GiantMIDI-Piano/issues/6 of the main repo, as of April 2022 some YT videos are missing, and the download&transcription scripts crash on those.

This PR adapts the scripts to ignore missing entries. Tested on Ubuntu 20.04 with recent CUDA and given dependencies.

opened by andres-fr 0

Error when downloading MP3 audio data

Hi!

Thanks for this amazing work.

I've encountered a small issue while downloading the audio data. My first impression is that it is related to files not being available anymore. Here is the log:

python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=0 --end_index=30000
[nltk_data] Downloading package punkt to /home/aferro/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/aferro/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
0; Jag A.; Je t'aime Juliette; Je t'aime Juliette - A. Jag
ERROR: Video unavailable

[]
1; C. A. Aadler; Floating Islands; Mind-Boggling Off-Grid FLOATING Island HOMESTEAD
Traceback (most recent call last):
  File "dataset.py", line 660, in <module>
    download_youtube_piano_solo(args)
  File "dataset.py", line 565, in download_youtube_piano_solo
    if float(meta_dict['piano_solo_prob'][n]) >= 0.5:
ValueError: could not convert string to float: ''

As we can see, the meta_dict['piano_solo_prob'][n] entry is expected to yield a string that can be casted to a float, i.e. something like "0.12345". But sometimes it yields empty strings, which cannot be casted into floats.

Without analyzing too much of the code, a possible fix could be the following:

try:
    prob = float(meta_dict['piano_solo_prob'][n])
except ValueError as ve:
    print("SKIPPING ENTRY DUE TO ERROR:", ve)
    n += 1
    continue

if prob >= 0.5:
    count += 1
    ...etc

So far this seems to run OK on my end, yielding the desired audio data and logs under "$WORKSPACE", but I'm not sure if we're supposed to ignore the empty n entries, or rather fix them so no empty entries are provided. What do you think? If this looks OK feel free to use the code, or let me know if you'd like me to do a PR.

Cheers,
Andres

opened by andres-fr 5

Illegal instruction

如图为了在安卓上运行它，我装了linux容器并装好了所有依赖但是我第一次运行时它说

Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present Error in cpuinfo: failed to parse both lists of possible and present processors 因为linux容器本来就没这两个文件于是我手动创建了它们然后编好了内容于是再次运行时就出现了Illegal instruction 可能是安卓特色(?) 有啥解决方案没？

opened by Tweak-1600 1

Owner

Bytedance Inc.

GitHub

This is a python package that turns any images into MIDI files that views the same as them

image_to_midi This is a python package that turns any images into MIDI files that views the same as them. This package firstly convert the image to AS

4 Mar 10, 2022

A python program for visualizing MIDI files, and displaying them in a spiral layout

SpiralMusic_python A python program for visualizing MIDI files, and displaying them in a spiral layout For a hardware version using Teensy & LED displ

6 Nov 23, 2022

PianoPlayer - Automatic fingering generator for piano scores

571 Jan 2, 2023

Pianote - An application that helps musicians practice piano ear training

Pianote Pianote is an application that helps musicians practice piano ear traini

3 Aug 17, 2022

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

14 Nov 2, 2022

GiantMIDI-Piano is a classical piano MIDI dataset contains 10,854 MIDI files of 2,786 composers

Related tags

Overview

GiantMIDI-Piano

Download GiantMIDI-Piano

Method 1 (suggested)

Method 2

Install requirements

Download audio recordings

Transcribe audios to MIDI files

Analyses the statistics of GiantMIDI-Piano

FAQ

Contact

Cite

License

You might also like...

Convert complex chord names to midi notes

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Multi-Track Music Generation with the Transfomer and the Johann Sebastian Bach Chorales dataset

eyeD3 is a Python module and command line program for processing ID3 tags. Information about mp3 files (i.e bit rate, sample frequency, play time, etc.) is also provided. The formats supported are ID3v1 (1.0/1.1) and ID3v2 (2.3/2.4).

Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

Python I/O for STEM audio files

Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

This Bot can extract audios and subtitles from video files

Comments

Error when processing stereo audio

updated mp3 download and audio2midi scripts to ignore missing files

Error when downloading MP3 audio data

Illegal instruction

Owner

Bytedance Inc.

This is a python package that turns any images into MIDI files that views the same as them

A python program for visualizing MIDI files, and displaying them in a spiral layout

PianoPlayer - Automatic fingering generator for piano scores

Pianote - An application that helps musicians practice piano ear training

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Learn chords with your MIDI keyboard !

A collection of free MIDI chords and progressions ready to be used in your DAW, Akai MPC, or Roland MC-707/101

Tradutor de um arquivo MIDI para ser usado em um simulador RISC-V(RARS)

Algorithmic and AI MIDI Drums Generator Implementation

Use python MIDI to write some simple music