KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021

Overview

KUIELab-MDX-Net

0. Environment

  • Ubuntu 20.04
  • at least four cuda-able GPUs (each >= 2080ti)
  • 1.5 TB disk storage for data augmentation
  • wandb for logging

Also, you must create .env file by copying .env.sample to set environmental variables.

wandb_api_key=[Your Key] # "xxxxxxxxxxxxxxxxxxxxxxxx"
data_dir=[Your Path] # "/home/ielab/repos/musdbHQ"
  • about wandb_api_key
    • we currently only support wandb for logging.
    • for wandb_api_key, visit wandb, go to setting, and then copy your api key
  • about data_dir
    • the absolute path where datasets are stored

1. Installation

conda env create -f conda_env_gpu.yaml -n mdx-net
conda activate mdx-net
pip install -r requirements.txt
sudo apt-get install soundstretch

2. Training & Submission

3. Leaderboard A vs Leaderboard B

ACKNOWLEDGEMENT

Comments
  • Encountered errors while executing training process #2

    Encountered errors while executing training process #2

    (Using Leaderboard_B) First I was stuck solving the environment and I let it sit for 30 min, but conda never finished creating the env from the yml. Because I was using a cloud instance, I didn't have time to wait and I did this instead:

    conda create -n mdx-net
    conda update conda
    conda config --add channels conda-forge
    conda activate mdx-net
    sudo apt-get install soundstretch
    python -m pip install -r requirements.txt
    python src/utils/data_augmentation.py --data_dir /real/path/to/musdbhq/ --train True --test True
    

    It seems that the model doesn't allow me to train it with songs that don't contain vocals.

    python src/utils/data_augmentation.py --data_dir /home/ubuntu/mdx-files/musdb/ --train True --test True
     10%|███████████████▉                                                                                                                                                     | 11/114 [01:13<11:25,  6.65s/it]
    Traceback (most recent call last):
      File "src/utils/data_augmentation.py", line 111, in <module>
        main(parser.parse_args())
      File "src/utils/data_augmentation.py", line 30, in main
        save_shifted_dataset(p, t, data_dir, 'train')
      File "src/utils/data_augmentation.py", line 92, in save_shifted_dataset
        source = load_wav(in_path.joinpath(s_name+'.wav'))
      File "src/utils/data_augmentation.py", line 102, in load_wav
        return sf.read(path, samplerate=sr, dtype='float32')[0].T
      File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 256, in read
        with SoundFile(file, 'r', samplerate, channels,
      File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 629, in __init__
        self._file = self._open(file, mode_int, closefd)
      File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1183, in _open
        _error_check(_snd.sf_error(file_ptr),
      File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1357, in _error_check
        raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
    RuntimeError: Error opening '/home/ubuntu/mdx-files/musdb/train/Artificial Intelligence - Native Instruments/vocals.wav': System error.
    

    I deleted the songs that didn't contain vocals, then the data augmentation succeeded, but all attempts to train failed and I didn't have time to do debugging in the cloud GPU instance.

    Here is the output from: python run.py experiment=multigpu_other model=ConvTDFNet_other

    /usr/lib/python3/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/lib/python3/dist-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
      warn(f"Failed to load image Python extension: {e}")
    Traceback (most recent call last):
      File "run.py", line 7, in <module>
        from pytorch_lightning.utilities import rank_zero_info
      File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
        from pytorch_lightning import metrics  # noqa: E402
      File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
        from pytorch_lightning.metrics.classification import (  # noqa: F401
      File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
        from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
      File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 16, in <module>
        from torchmetrics import Accuracy as _Accuracy
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/__init__.py", line 14, in <module>
        from torchmetrics import functional  # noqa: E402
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/__init__.py", line 14, in <module>
        from torchmetrics.functional.audio.pit import permutation_invariant_training, pit, pit_permutate
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/__init__.py", line 26, in <module>
        from torchmetrics.functional.audio.pesq import perceptual_evaluation_speech_quality  # noqa: F401
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/pesq.py", line 20, in <module>
        import pesq as pesq_backend
      File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/__init__.py", line 5, in <module>
        from ._pesq import pesq, pesq_batch
      File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/_pesq.py", line 8, in <module>
        from .cypesq import cypesq, cypesq_retvals, cypesq_error_message as pesq_error_message
      File "__init__.pxd", line 238, in init cypesq
    ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 80 from PyObject
    
    
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA A100-PCI...  On   | 00000000:07:00.0 Off |                    0 |
    | N/A   35C    P0    36W / 250W |      0MiB / 40960MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    |   1  NVIDIA A100-PCI...  On   | 00000000:08:00.0 Off |                    0 |
    | N/A   34C    P0    33W / 250W |      0MiB / 40960MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    
    opened by Ma5onic 8
  • mskim revision

    mskim revision

    • [x] please check the below class is designed correctly.

    https://github.com/kuielab/mdx-net/blob/b9eb84de886143dcb4842ad2daa613ac0a8f6308/src/models/mdxnet.py#L128

    question 
    opened by ws-choi 3
  • Separation without training

    Separation without training

    Hello, is it possible to run separation using KUIELab-MDX-Net without training the model from scratch? Do you share some pretrained models? It would ease the evaluation of your solution and its usage in a real-case scenario.

    opened by kichel98 2
  • Data Augmentation

    Data Augmentation

    • [x] implement test code for below data.zip

    Originally posted by @rlaalstjr47 in https://github.com/kuielab/mdx-net/issues/3#issuecomment-873558168

    TODO

    • [ ] gen script with hydra
    • [x] auto metadata caching for data augmentation
    • [ ] parameterized data aug with hydra
    • [ ] debugging to check if it is well-designed
    enhancement 
    opened by ws-choi 2
  • How do you save the mixer model?

    How do you save the mixer model?

    Hi !

    I am trying to train the mixer model but it only saves a .ckpt file that is around 58mb. When i run predict_blend with my mixer checkpoint I get this error

    RuntimeError: Error(s) in loading state_dict for Mixer: Missing key(s) in state_dict: "linear.weight". Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "hparams_name", "hyper_parameters".

    In your repo the mixer model is very very small and it says to "save .pt, the only learnable parameters in Mixer" Could you tell me how to do this please?

    Thanks !

    opened by KimberleyJensen 0
  • how to use auto_lr_find (NameError: name 'trainer' is not defined)

    how to use auto_lr_find (NameError: name 'trainer' is not defined)

    Hi i am trying to use lightning auto_lr_find i set to = true and run the command "trainer.tune(model)" i get the error

    NameError: name 'trainer' is not defined

    please do you have any advice for this?

    opened by lyndonlauder 0
  • Valid bug fix

    Valid bug fix

    • self.log in validation step with sync_dist=True is not enough to log the exact validation loss for musdb18 especially if you are using multi gpus.

    Environment

    • 4 gpus ddp
    • 14 tracks in validation step
    • 1 batched dataloader

    outcomes

    • 16 tracks not 14 tracks are evaluated

    • why?

      • node1: 0 1 2 3
      • node2: 4 5 6 7
      • node3: 8 9 10 11
      • node4: 12 13 0 1 <= bolds are not supposed to be here
    • It seems a kind of bug of pytorch-lightning.

    • using drop_last=True also does not work for this issue.

    • I updated ugly code to fix this.

    opened by ws-choi 0
  • integration: yaml_instrument based on default

    integration: yaml_instrument based on default

    • it works in 155

    • but double check is recommended

      • [ ] run vocals and check log
      • [ ] run drums and check log
      • [ ] run bass and check log
      • [ ] run other and check log
    opened by ws-choi 0
  • How to train a model that can fully extract the 44100hz frequency

    How to train a model that can fully extract the 44100hz frequency

    I want to train a 2 stems model

    I noticed that in the yaml configuration of each model, there are some parameters that will affect the final frequency cutoff, it seems that multigpu_drums.yaml can handle the full 44100hz frequency, but with the reduction of num_blocks (11 => 9), the model size will also decrease accordingly (29mb => 21mb).

    Although using something like multigpu_drums.yaml can handle 44100hz in full, but the model shrinks instead. Does this affect the final accuracy?

    It seems that dim_t, hop_length, overlap, num_blocks these parameters have a wonderful complementarity that I cannot understand, maybe this 'complementarity' is designed for the competition(mix to demucs), but I want to apply this to the real world without demucs(only mdx-net, after some testing, I think the potential of mdx-net is higher than demucs).

    When I try to change num_blocks from 9 to 11, the results of inference have overlapping and broken voices... do you have any good parameters recommendations for me to train a full 44100hz one without loss of accuracy (i.e. the model does not Shrinking)

    opened by dingjibang 8
  • Encounter a error while executing training process

    Encounter a error while executing training process

    Hi, I followed all you mentioned step. And I executed python run.py experiment=multigpu_vocals model=ConvTDFNet_vocals to start training then encounter following problem. 💔

    RuntimeError: Early stopping conditioned on metric `val/sdr` which is not available. Pass in or modify your `EarlyStopping` callback to use any of the following: `train/loss`, `train/loss_step`, `train/loss_epoch`
    

    Appreciate to get your reply. ✨✨

    opened by WangWilly 2
  • Error on building pesq

    Error on building pesq "pesq/cypesq.c:6:10: fatal error: Python.h: No such file or directory"

    Error encountered while "pip install -r requrements.txt"

    image

    Solution: We need to install libpythonX.X-dev to build pesq. The version should be followed your SYSTEM PYTHON VERSION, not conda python version, because pesq will build by gcc on your system

    ex) (mdx-net) cnh2769@SPV02:~/_Project/mdx-net$ which python3 /home/cnh2769/anaconda3/envs/mdx-net/bin/python3 (mdx-net) cnh2769@SPV02:~/_Project/mdx-net$ python3 --version Python 3.8.11 (mdx-net) cnh2769@SPV02:~/Project/mdx-net$ /usr/bin/python3 --version Python 3.7.11 (mdx-net) cnh2769@SPV02:~/Project/mdx-net$ sudo apt install libpython3.7-dev

    opened by cnh2769 1
  • TFC-TDF-U-Net's performance on Musdb18

    TFC-TDF-U-Net's performance on Musdb18

    Our main approach is based on TFC-TDF-U-Net [3].

    This model was originally proposed for Singing Voice Separation.

    but it turns out that this model also performs well for other musical instruments (drums, bass, other)

    [3] Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

    good first issue 
    opened by ws-choi 0
Waymo motion prediction challenge 2021: 3rd place solution

Waymo motion prediction challenge 2021: 3rd place solution ?? Technical report ??️ Presentation ?? Announcement ??Motion Prediction Channel Website ??

null 158 Jan 8, 2023
Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

Simple_Linear_2nd_ODE_Solver_GUI Description It is a 2nd constant coefficient li

:) 4 Feb 5, 2022
Xview3 solution - XView3 challenge, 2nd place solution

Xview3, 2nd place solution https://iuu.xview.us/ test split aggregate score publ

Selim Seferbekov 24 Nov 23, 2022
The 3rd place solution for competition

The 3rd place solution for competition "Lyft Motion Prediction for Autonomous Vehicles" at Kaggle Team behind this solution: Artsiom Sanakoyeu [Homepa

Artsiom 104 Nov 22, 2022
Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Hiroshechka Y 33 Dec 26, 2022
Woosung Choi 63 Nov 14, 2022
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

Phong Nguyen Ha 4 May 26, 2022
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

BERT Got a Date: Introducing Transformers to Temporal Tagging Satya Almasian*, Dennis Aumiller*, and Michael Gertz Heidelberg University Contact us vi

null 54 Dec 4, 2022
2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

TableMASTER-mmocr Contents About The Project Method Description Dependency Getting Started Prerequisites Installation Usage Data preprocess Train Infe

Jianquan Ye 298 Dec 21, 2022
Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

PyTorch Image Classifier Updates As for many users request, I released a new version of standared pytorch immage classification example at here: http:

JinTian 106 Nov 6, 2022
Nicholas Lee 3 Jan 9, 2022
Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

KAIROS MineRL BASALT Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL B

Vinicius G. Goecks 37 Oct 30, 2022
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

MIC-DKFZ 1.2k Jan 4, 2023
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Chord Recognition Demo application The demo application is written in C# with .NETCore. As of July 9, 2020, the only version available is for windows

Andres Mauricio Rondon Patiño 24 Oct 22, 2022
🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval ?? The 1st Place Submission to AICity Challenge 2021 Natural

null 82 Dec 29, 2022
The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

AICITY2021_Track2_DMT The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop. Introduction

Hao Luo 91 Dec 21, 2022
4th place solution for the SIGIR 2021 challenge.

SIGIR-2021 (Tinkoff.AI) How to start Download train and test data: https://sigir-ecom.github.io/data-task.html Place it under sigir-2021/data/. Run py

Tinkoff.AI 4 Jul 1, 2022
Meli Data Challenge 2021 - First Place Solution

My solution for the Meli Data Challenge 2021

Matias Moreyra 23 Mar 9, 2022
The sixth place winning solution (6/220) in 2021 Gaofen Challenge.

SwinTransformer + OBBDet The sixth place winning solution (6/220) in the track of Fine-grained Object Recognition in High-Resolution Optical Images, 2

ming71 46 Dec 2, 2022