Music Demixing Challenge - xumx-sliCQ
This repository is the GitHub mirror of my working submission repository for the AICrowd ISMIR 2021 Music Demixing Challenge (MDX): https://gitlab.aicrowd.com/sevagh/music-demixing-challenge-starter-kit
Related links to my submission are:
- Clean repository for my neural network, https://github.com/sevagh/xumx-sliCQ - to make submissions, I copied the code and trained models from the xumx-sliCQ project into this one
- PyTorch sliCQ Transform: https://github.com/sevagh/nsgt
- My post-competition presentation: YouTube recording
▶️ , slides📚
Here's a summary of (what I consider to be) interesting tagged submissions, starting from newest to oldest:
- Wiener-EM on zero-padded sliCQ, still too slow: https://github.com/sevagh/music-demixing-challenge-ismir-2021/commit/9e9f80c5664bad154a56a5bf885d3584e4e8bd5e
- Wiener-EM on sliCQ, too slow: https://github.com/sevagh/music-demixing-challenge-ismir-2021/commit/ebae8aa24979eece02cf88ffae01e991d7410965
- Dilated convolutions for faster inference: https://github.com/sevagh/music-demixing-challenge-ismir-2021/commit/bfbccf9323692da6b3ebf623d1b0366b65ba50c3
- Bandwidth model, where frequency bins above 16000 Hz are ignored (those sliCQT bins pass through the network unmodified): https://github.com/sevagh/music-demixing-challenge-ismir-2021/commit/b80d56266b9a630775a44843c4a124227a20a738
- First time switching to CrossNet-UMX (X-UMX): https://github.com/sevagh/music-demixing-challenge-ismir-2021/commit/d2784c89d217f12b66422121db0eaa7ed3c751ca
- Use Wiener-EM with the STFT instead of no EM step: https://github.com/sevagh/music-demixing-challenge-ismir-2021/commit/55f85db110aa57fbe3ab21247d76e3e82b67d544
- One of the very first successful models, pre-XUMX, called "umx-sliCQ": https://github.com/sevagh/music-demixing-challenge-ismir-2021/commit/2cee876dc092a8c52f249440cadf018b6b16cbe3
I tried many ideas over the course of the competition. I discarded some as hard to explain or train, but they might still be worthwhile. You can see a "scrapyard" of my various abandoned ideas: https://gitlab.com/sevagh/xumx_slicq_extra/-/tree/main/umx_experiments. From these, my favorites are:
- 3D convolutions on the
(slice x time x frequency)
sliCQT, versus overlap-adding into((slice x time) x frequency)
: https://gitlab.com/sevagh/xumx_slicq_extra/-/tree/main/umx_experiments/umx-sliCQ-conv3d-orig-branch - Different sliCQ parameters per target: https://gitlab.com/sevagh/xumx_slicq_extra/-/tree/main/umx_experiments/umx-sliCQ-lstm-branch, https://gitlab.com/sevagh/xumx_slicq_extra/-/tree/main/umx_experiments/umx-sliCQ-first-submission - this would also be compatible with doing the Wiener-EM with the STFT (since 4 different sliCQT cannot have the iterative EM applied to them - also, crossnet loss won't work with 4 different sliCQT)
- LSTM instead of convolutions: https://gitlab.com/sevagh/xumx_slicq_extra/-/tree/main/umx_experiments/umx-sliCQ-lstm-branch
📎
Important info
-
💪 Challenge Page: https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021 -
🗣️ Discussion Forum: https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021/discussion -
🏆 Leaderboard: https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021/leaderboards
Contributors