KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021

IELab@ Korea University

Last update: Dec 28, 2022

Related tags

Deep Learning pytorch hydra source-separation music-source-separation pytorch-lightning wandb mdx-challenge

Overview

KUIELab-MDX-Net

presentation slide

0. Environment

Ubuntu 20.04
at least four cuda-able GPUs (each >= 2080ti)
1.5 TB disk storage for data augmentation
wandb for logging

Also, you must create .env file by copying .env.sample to set environmental variables.

wandb_api_key=[Your Key] # "xxxxxxxxxxxxxxxxxxxxxxxx"
data_dir=[Your Path] # "/home/ielab/repos/musdbHQ"

about wandb_api_key
- we currently only support wandb for logging.
- for wandb_api_key, visit wandb, go to setting, and then copy your api key
about data_dir
- the absolute path where datasets are stored

1. Installation

conda env create -f conda_env_gpu.yaml -n mdx-net
conda activate mdx-net
pip install -r requirements.txt
sudo apt-get install soundstretch

2. Training & Submission

3. Leaderboard A vs Leaderboard B

The main difference between the branch Leaderboard_A and Leaderboard_B is the usage of the test dataset of Musdb18.
- Leaderboard A does not use test dataset for training: https://github.com/kuielab/mdx-net/blob/Leaderboard_A/configs/experiment/multigpu_default.yaml
- Leaderboard B uses test dataset for training: https://github.com/kuielab/mdx-net/blob/b45eff172928dc9fc31852ee65072fb01f4c2d08/configs/experiment/multigpu_default.yaml#L16

ACKNOWLEDGEMENT

This repository is based on Lightning-Hydra Template
Repository of TFC-TDF-U-Net, our previous ISMIR 2020 paper
Also, facebook/demucs

Comments

Encountered errors while executing training process #2

(Using Leaderboard_B) First I was stuck solving the environment and I let it sit for 30 min, but conda never finished creating the env from the yml. Because I was using a cloud instance, I didn't have time to wait and I did this instead:

conda create -n mdx-net
conda update conda
conda config --add channels conda-forge
conda activate mdx-net
sudo apt-get install soundstretch
python -m pip install -r requirements.txt
python src/utils/data_augmentation.py --data_dir /real/path/to/musdbhq/ --train True --test True

It seems that the model doesn't allow me to train it with songs that don't contain vocals.

python src/utils/data_augmentation.py --data_dir /home/ubuntu/mdx-files/musdb/ --train True --test True
 10%|███████████████▉                                                                                                                                                     | 11/114 [01:13<11:25,  6.65s/it]
Traceback (most recent call last):
  File "src/utils/data_augmentation.py", line 111, in <module>
    main(parser.parse_args())
  File "src/utils/data_augmentation.py", line 30, in main
    save_shifted_dataset(p, t, data_dir, 'train')
  File "src/utils/data_augmentation.py", line 92, in save_shifted_dataset
    source = load_wav(in_path.joinpath(s_name+'.wav'))
  File "src/utils/data_augmentation.py", line 102, in load_wav
    return sf.read(path, samplerate=sr, dtype='float32')[0].T
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 256, in read
    with SoundFile(file, 'r', samplerate, channels,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1183, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '/home/ubuntu/mdx-files/musdb/train/Artificial Intelligence - Native Instruments/vocals.wav': System error.

I deleted the songs that didn't contain vocals, then the data augmentation succeeded, but all attempts to train failed and I didn't have time to do debugging in the cloud GPU instance.

Here is the output from: python run.py experiment=multigpu_other model=ConvTDFNet_other

/usr/lib/python3/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/lib/python3/dist-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
  warn(f"Failed to load image Python extension: {e}")
Traceback (most recent call last):
  File "run.py", line 7, in <module>
    from pytorch_lightning.utilities import rank_zero_info
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 16, in <module>
    from torchmetrics import Accuracy as _Accuracy
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/__init__.py", line 14, in <module>
    from torchmetrics import functional  # noqa: E402
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/__init__.py", line 14, in <module>
    from torchmetrics.functional.audio.pit import permutation_invariant_training, pit, pit_permutate
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/__init__.py", line 26, in <module>
    from torchmetrics.functional.audio.pesq import perceptual_evaluation_speech_quality  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/pesq.py", line 20, in <module>
    import pesq as pesq_backend
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/__init__.py", line 5, in <module>
    from ._pesq import pesq, pesq_batch
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/_pesq.py", line 8, in <module>
    from .cypesq import cypesq, cypesq_retvals, cypesq_error_message as pesq_error_message
  File "__init__.pxd", line 238, in init cypesq
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 80 from PyObject

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   35C    P0    36W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  On   | 00000000:08:00.0 Off |                    0 |
| N/A   34C    P0    33W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

opened by Ma5onic 8

mskim revision
[x] please check the below class is designed correctly.

https://github.com/kuielab/mdx-net/blob/b9eb84de886143dcb4842ad2daa613ac0a8f6308/src/models/mdxnet.py#L128
question
opened by ws-choi 3
Separation without training

Hello, is it possible to run separation using KUIELab-MDX-Net without training the model from scratch? Do you share some pretrained models? It would ease the evaluation of your solution and its usage in a real-case scenario.

opened by kichel98 2
Data Augmentation
[x] implement test code for below data.zip

Originally posted by @rlaalstjr47 in https://github.com/kuielab/mdx-net/issues/3#issuecomment-873558168

TODO

[ ] gen script with hydra

[x] auto metadata caching for data augmentation

[ ] parameterized data aug with hydra

[ ] debugging to check if it is well-designed

enhancement
opened by ws-choi 2
How do you save the mixer model?

Hi !

I am trying to train the mixer model but it only saves a .ckpt file that is around 58mb. When i run predict_blend with my mixer checkpoint I get this error

RuntimeError: Error(s) in loading state_dict for Mixer: Missing key(s) in state_dict: "linear.weight". Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "hparams_name", "hyper_parameters".

In your repo the mixer model is very very small and it says to "save .pt, the only learnable parameters in Mixer" Could you tell me how to do this please?

Thanks !

opened by KimberleyJensen 0
how to use auto_lr_find (NameError: name 'trainer' is not defined)

Hi i am trying to use lightning auto_lr_find i set to = true and run the command "trainer.tune(model)" i get the error

NameError: name 'trainer' is not defined

please do you have any advice for this?

opened by lyndonlauder 0
Valid bug fix
self.log in validation step with sync_dist=True is not enough to log the exact validation loss for musdb18 especially if you are using multi gpus.

Environment

4 gpus ddp

14 tracks in validation step

1 batched dataloader

outcomes

16 tracks not 14 tracks are evaluated

why?

node1: 0 1 2 3

node2: 4 5 6 7

node3: 8 9 10 11

node4: 12 13 0 1 <= bolds are not supposed to be here

It seems a kind of bug of pytorch-lightning.

using drop_last=True also does not work for this issue.

I updated ugly code to fix this.
opened by ws-choi 0
integration: yaml_instrument based on default
it works in 155

but double check is recommended

[ ] run vocals and check log

[ ] run drums and check log

[ ] run bass and check log

[ ] run other and check log
opened by ws-choi 0
How to train a model that can fully extract the 44100hz frequency

I want to train a 2 stems model

I noticed that in the yaml configuration of each model, there are some parameters that will affect the final frequency cutoff, it seems that multigpu_drums.yaml can handle the full 44100hz frequency, but with the reduction of num_blocks (11 => 9), the model size will also decrease accordingly (29mb => 21mb).

Although using something like multigpu_drums.yaml can handle 44100hz in full, but the model shrinks instead. Does this affect the final accuracy?

It seems that dim_t, hop_length, overlap, num_blocks these parameters have a wonderful complementarity that I cannot understand, maybe this 'complementarity' is designed for the competition(mix to demucs), but I want to apply this to the real world without demucs(only mdx-net, after some testing, I think the potential of mdx-net is higher than demucs).

When I try to change num_blocks from 9 to 11, the results of inference have overlapping and broken voices... do you have any good parameters recommendations for me to train a full 44100hz one without loss of accuracy (i.e. the model does not Shrinking)

opened by dingjibang 8
Encounter a error while executing training process
Hi, I followed all you mentioned step. And I executed python run.py experiment=multigpu_vocals model=ConvTDFNet_vocals to start training then encounter following problem. 💔

RuntimeError: Early stopping conditioned on metric `val/sdr` which is not available. Pass in or modify your `EarlyStopping` callback to use any of the following: `train/loss`, `train/loss_step`, `train/loss_epoch`

Appreciate to get your reply. ✨✨
opened by WangWilly 2
Error on building pesq "pesq/cypesq.c:6:10: fatal error: Python.h: No such file or directory"

Error encountered while "pip install -r requrements.txt"

Solution: We need to install libpythonX.X-dev to build pesq. The version should be followed your SYSTEM PYTHON VERSION, not conda python version, because pesq will build by gcc on your system

ex) (mdx-net) cnh2769@SPV02:~/_Project/mdx-net$ which python3 /home/cnh2769/anaconda3/envs/mdx-net/bin/python3 (mdx-net) cnh2769@SPV02:~/_Project/mdx-net$ python3 --version Python 3.8.11 (mdx-net) cnh2769@SPV02:~/Project/mdx-net$ /usr/bin/python3 --version Python 3.7.11 (mdx-net) cnh2769@SPV02:~/Project/mdx-net$ sudo apt install libpython3.7-dev

opened by cnh2769 1
TFC-TDF-U-Net's performance on Musdb18
Our main approach is based on TFC-TDF-U-Net [3].

This model was originally proposed for Singing Voice Separation.

but it turns out that this model also performs well for other musical instruments (drums, bass, other)

More information: please see page 53 of the following dissertation

Choi, Woosung, Deep Learning-based Latent Source Analysis for Source-aware Audio Manipulation. PhD Dissertation. Korea University, 2021.

TFC-TDF-U-Net Repository

[3] Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.
good first issue
opened by ws-choi 0

KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021

Related tags

Overview

KUIELab-MDX-Net

0. Environment

1. Installation

2. Training & Submission

3. Leaderboard A vs Leaderboard B

ACKNOWLEDGEMENT

Comments

Environment

outcomes

Owner

IELab@ Korea University

Waymo motion prediction challenge 2021: 3rd place solution

Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

Xview3 solution - XView3 challenge, 2nd place solution

The 3rd place solution for competition

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

Where-Got-Time - An NUS timetable generator which uses a genetic algorithm to optimise timetables to suit the needs of NUS students

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

4th place solution for the SIGIR 2021 challenge.

Meli Data Challenge 2021 - First Place Solution

The sixth place winning solution (6/220) in 2021 Gaofen Challenge.