LAKH MuseNet MIDI Dataset
Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums)
Bonus: Choir on Channel 10
Please CC BY-NC-SA
Make your own with the colab or download converted output here:
https://1drv.ms/u/s!Ao9gnMkvUA2KgZBWDIQJIG-JS6RpPQ?e=ur8ggN
wget install:
wget --no-check-certificate -O LAKH-MuseNet-MIDI-Dataset.zip "https://onedrive.live.com/download?cid=8A0D502FC99C608F&resid=8A0D502FC99C608F%2118518&authkey=AHkhwwRc0yg2QYY"
Source license/attribution
The Lakh MIDI Dataset is distributed with a CC-BY 4.0 license; if you use this data in any capacity, please reference this page and my thesis:
Colin Raffel. "Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching". PhD Thesis, 2016.
Of course, I did not transcribe any of the MIDI files in the Lakh MIDI Dataset. While MIDI files have a built-in mechanism for attribution (the Copyright meta-event), it is not used consistently, so attributing each of the MIDI files in the dataset to a particular author is not feasible. If you'd like to try, here is a list of the text of all of the Copyright meta-events in the Lakh MIDI Dataset.
If you use the Million Song Dataset, please reference this paper:
Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. "The Million Song Dataset". In Proceedings of the 12th International Society for Music Information Retrieval Conference, pages 591–596, 2011.