Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

Michele Mancusi

Last update: Oct 25, 2022

Related tags

Deep Learning LQVAE-separation

Overview

LQVAE-separation

Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

Paper

Samples

	GT Compressed	Separated
Drums	GT Compressed Drums	Separated Drums
Bass	GT Compressed Bass	Separated Bass
Mix	GT Compressed Mix	Separated Mix

The separation is performed on a x64 compressed latent domain. The results can be upsampled via Jukebox upsamplers in order to increment perceptive quality (WIP).

Install

Install the conda package manager from https://docs.conda.io/en/latest/miniconda.html

conda create --name lqvae-separation python=3.7.5
conda activate lqvae-separation
pip install mpi4py==3.0.3
pip install ffmpeg-python==0.2.0
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
pip install -r requirements.txt
pip install -e .

Checkpoints

Enter inside script/ folder and create the folder checkpoints/ and the folder results/.
Download the checkpoints contained in this Google Drive folder and put them inside checkpoints/

Separation with checkpoints

Call the following in order to perform bs separations of 3 seconds starting from second shift of the mixture created with the sources in path_1 and path_2. The sources must be WAV files sampled at 22kHz.
```
PYTHONPATH=.. python bayesian_inference.py --shift=shift --path_1=path_1 --path_2=path_2 --bs=bs
```
The default value for bs is 64, and can be handled by an RTX3080 with 16 GB of VRAM. Lower the value if you get CUDA: out of memory.

Training

LQ-VAE

The vqvae/vqvae.pyfile of Jukebox has been modified in order to include the linearization loss of the LQ-VAE (it is computed at all levels of the hierarchical VQ-VAE but we only care of the topmost level given that we perform separation there). One can train a new LQ-VAE on custom data (here data/train for train and data/test for test) by running the following from the root of the project

PYTHONPATH=. mpiexec -n 1 python jukebox/train.py --hps=vqvae --sample_length=131072 --bs=8 
--audio_files_dir=data/train/ --labels=False --train --test --aug_shift --aug_blend --name=lq_vae --test_audio_files_dir=data/test

The trained model uses the vqvae hyperparameters in hparams.py so if you want to change the levels / downsampling factors you have to modify them there.
The only constraint for training the LQ-VAE is to use an even number for the batch size, given its use of pairs in the loss.
Given that L_lin enforces the sum operation on the latent domain, you can use the data of both sources together (or any other audio data).
Checkpoints are save in logs/lq_vae (lq_vae is the name parameter).

Priors

After training the LQ-VAE, train two priors on two different classes by calling

PYTHONPATH=. mpiexec -n 1 python jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=pior_source
 --audio_files_dir=data/source/train --test_audio_files_dir=data/source/test --labels=False --train --test --aug_shift
  --aug_blend --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000 --min_duration=24 --sample_length=1048576 
  --bs=16 --n_ctx=8192 --sample=True --sample_iters=1000 --restore_vqvae=logs/lq_vae/checkpoint_lq_vae.pth.tar

Here the data of the source is located in data/source/train and data/source/test and we assume the LQ-VAE has 3 levels (topmost level = 2).
The Transformer model is defined by the parameters of small_prior in hparams.py and uses a context of n_ctx=8192 codes.
The checkpoint path of the LQ-VAE trained in the previous step must be passed to --restore_vqvae
Checkpoints are save in logs/pior_source (pior_source is the name parameter).

Codebook sums

Before separation, the sums between all codes must be computed using the LQ-VAE. This can be done using the codebook_precalc.py in the script folder:

PYTHONPATH=.. python codebook_precalc.py --save_path=checkpoints/codebook_sum_precalc.pt 
--restore_vqvae=../logs/lq_vae/checkpoint_lq_vae.pth.tar` --raw_to_tokens=64 --l_bins=2048
--sample_rate=22050 --alpha=[0.5, 0.5] --downs_t=(2, 2, 2) --commit=1.0 --emb_width=64

Separation with trained checkpoints

Trained checkpoints can be given to bayesian_inference.py as following:

PYTHONPATH=.. python bayesian_inference.py --shift=shift --path_1=path_1 --path_2=path_2 --bs=bs --restore_vqvae=checkpoints/checkpoint_step_60001_latent.pth.tar
--restore_priors 'checkpoints/checkpoint_drums_22050_latent_78_19k.pth.tar' checkpoints/checkpoint_latest.pth.tar' --sum_codebook=checkpoints/codebook_precalc_22050_latent.pt

restore_priors accepts two paths to the first and second prior checkpoints.

Evaluation

In order to evaluate the pre-trained checkpoints, run bayesian_test.py after you have put the full Slakh drums and bass validation split inside data/bass/validation and data/drums/validation.

Future work

training of upsamplers for increasing the quality of the separations
better rejection sampling method (maybe use verifiers as in https://arxiv.org/abs/2110.14168)

Citations

If you find the code useful for your research, please consider citing

@article{mancusi2021unsupervised,
  title={Unsupervised Source Separation via Bayesian Inference in the Latent Domain},
  author={Mancusi, Michele and Postolache, Emilian and Fumero, Marco and Santilli, Andrea and Cosmo, Luca and Rodol{\`a}, Emanuele},
  journal={arXiv preprint arXiv:2110.05313},
  year={2021}
}

as well as the Jukebox baseline:

Dhariwal, P., Jun, H., Payne, C., Kim, J. W., Radford, A., & Sutskever, I. (2020). Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341.

Opinionated code formatter, just like Python's black code formatter but for Beancount

beancount-black Opinionated code formatter, just like Python's black code formatter but for Beancount Try it out online here Features MIT licensed - b

16 Oct 11, 2022

a delightful machine learning tool that allows you to train, test and use models without writing code

igel A delightful machine learning tool that allows you to train/fit, test and use models without writing code Note I'm also working on a GUI desktop

3k Jan 5, 2023

Pytorch Lightning code guideline for conferences

Deep learning project seed Use this seed to start new deep learning / ML projects. Built in setup.py Built in requirements Examples with MNIST Badges

1k Jan 2, 2023

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

397 Dec 30, 2022

Code samples for my book "Neural Networks and Deep Learning"

Code samples for "Neural Networks and Deep Learning" This repository contains code samples for my book on "Neural Networks and Deep Learning". The cod

13.9k Dec 26, 2022

Code for: https://berkeleyautomation.github.io/bags/

DeformableRavens Code for the paper Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks. Here is the

121 Dec 30, 2022

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

138 Dec 12, 2022

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

68 Dec 29, 2022

Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search This is an implementation for our paper Contextual Non-Loca

50 Dec 3, 2022

Comments

UnboundLocalError: local variable 'nll_sum_0' referenced before assignment

Traceback (most recent call last): File "bayesian_inference.py", line 828, in separate(args) File "bayesian_inference.py", line 765, in separate emb_width=args.emb_width) File "bayesian_inference.py", line 38, in sample_level emb_width=emb_width) File "bayesian_inference.py", line 69, in sample_single_window sum_codebook=sum_codebook, emb_width=emb_width) File "bayesian_inference.py", line 103, in sample return x_0, x_1, nll_sum_0, nll_sum_1, None UnboundLocalError: local variable 'nll_sum_0' referenced before assignment

opened by bubblegg 6
Dimension mismatch error during training priors

Hi, thanks for the code!

I finished training the LQ-VAE and everything runs smoothly, however, when I tried to train the priors as suggested in the README file, I ran into this error:

File "/LQVAE-separation/jukebox/prior/autoregressive.py" line 156, in forward , x = self.x_emb_dropout(x) + self.pos_emb_dropout(self.pos_emb()) + x_cond # Pos emb and dropout RuntimeError: The size of tensor a (1000) must match the size of tensor b (8192) at non-singleton dimension 1

Suggesting that the dimension of self.pos_emb_dropout(self.pos_emb()) (torch.Size([8192, 1024])) is different from self.x_emb_dropout(x) (torch.Size([16, 1000, 1024])) and x_cond ( torch.Size([16, 1, 1024]))

So in my case, x originally is (torch.Size([16, 64000, 1])) (4s inputs) and then after going through an encoder, it becomes (torch.Size([16, 1000])), which is before feeding into the emb.

If I understand this correctly, the vqvae encoder would downsample the input by 64 times first, so 64000 becomes 1000. But this position embedding context length is set to 8192, so the minimum input audio should be at least 8192*64/sr second(s) right? Can I set a smaller ctx dimension, like 1000? Would this affect the final performance?

opened by gzhu06 3

Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

Related tags

Overview

LQVAE-separation

Samples

Install

Checkpoints

Separation with checkpoints

Training

LQ-VAE

Priors

Codebook sums

Separation with trained checkpoints

Evaluation

Future work

Citations

You might also like...

Opinionated code formatter, just like Python's black code formatter but for Beancount

a delightful machine learning tool that allows you to train, test and use models without writing code

Pytorch Lightning code guideline for conferences

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Code samples for my book "Neural Networks and Deep Learning"

Code for: https://berkeleyautomation.github.io/bags/

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

Comments

UnboundLocalError: local variable 'nll_sum_0' referenced before assignment

Dimension mismatch error during training priors

Owner

Michele Mancusi

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

A code generator from ONNX to PyTorch code

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

Convert Python 3 code to CUDA code.

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Low-code/No-code approach for deep learning inference on devices

Code for all the Advent of Code'21 challenges mostly written in python

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"