E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

Overview

End-to-end Music Remastering System

This repository includes source code and pre-trained models of the work End-to-end Music Remastering System Using Self-supervised and Adversarial Training by Junghyun Koo, Seungryeol Paik, and Kyogu Lee.

We provide inference code of the proposed system, which targets to alter the mastering style of a song to desired reference track.

arXiv Demo Page

Pre-trained Models

Model Number of Epochs Trained Details
Music Effects Encoder 1000 Trained with MTG-Jamendo Dataset
Mastering Cloner 1000 Trained with the above pre-trained Music Effects Encoder and Projection Discriminator

Inference

To run the inference code,

  1. Download pre-trained models above and place them under the folder named 'model_checkpoints' (default)
  2. Prepare input and reference tracks under the folder named 'inference_samples' (default).
    Target files should be organized as follow:
    "path_to_data_directory"/"song_name_#1"/input.wav
    "path_to_data_directory"/"song_name_#1"/reference.wav
    ...
    "path_to_data_directory"/"song_name_#n"/input.wav
    "path_to_data_directory"/"song_name_#n"/reference.wav
  1. Run 'inference.py'
python inference.py \
    --ckpt_dir "path_to_checkpoint_directory" \
    --data_dir_test "path_to_directory_containing_inference_samples"
  1. Outputs will be stored under the folder 'inference_samples' (default)

Note: The system accepts WAV files of stereo-channeled, 44.1kHZ, and 16-bit rate. Target files shold be named "input.wav" and "reference.wav".

Configurations of each sub-networks

config_table

A detailed configuration of each sub-networks can also be found at

Self_Supervised_Music_Remastering_System/configs.yaml
You might also like...
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

This repo is the official implementation of "Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework". @inproceedings{zhou2021insta

Pytorch library for end-to-end transformer models training and serving

Pytorch library for end-to-end transformer models training and serving

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

AdaFocusV2 This repo contains the official code and pre-trained models for AdaFo

 Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models
Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.)

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

RoSTER The source code used for Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training, p

Training code and evaluation benchmarks for the
Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Self-Supervised Policy Adaptation during Deployment PyTorch implementation of PAD and evaluation benchmarks from Self-Supervised Policy Adaptation dur

Comments
  • IndexError: list index out of range

    IndexError: list index out of range

    Error trying to inference:

    (mdx-submit) C:\Users\lucas\Downloads\e2e_music_remastering_system>python inference.py --data_dir_test "C:\Users\lucas\Downloads\e2e_music_remastering_system\inference_samples\set_1"
    C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torchaudio\extension\extension.py:13: UserWarning: torchaudio C++ extension is not available.
      warnings.warn('torchaudio C++ extension is not available.')
    C:\Users\lucas\Downloads\e2e_music_remastering_system\Self_Supervised_Music_Remastering_System\config.py:90: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
      configs = yaml.load(f)
    ---reloaded checkpoint weights - Mastering Cloner---
    ---reloaded checkpoint weights - Music Effects Encoder---
    C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torchaudio\extension\extension.py:13: UserWarning: torchaudio C++ extension is not available.
      warnings.warn('torchaudio C++ extension is not available.')
    C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torchaudio\extension\extension.py:13: UserWarning: torchaudio C++ extension is not available.
      warnings.warn('torchaudio C++ extension is not available.')
    C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torchaudio\extension\extension.py:13: UserWarning: torchaudio C++ extension is not available.
      warnings.warn('torchaudio C++ extension is not available.')
    C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torchaudio\extension\extension.py:13: UserWarning: torchaudio C++ extension is not available.
      warnings.warn('torchaudio C++ extension is not available.')
    Traceback (most recent call last):
      File "inference.py", line 159, in <module>
        inf.inference()
      File "inference.py", line 77, in inference
        for step, (whole_song_ori, whole_song_ref, song_name) in enumerate(self.data_loader):
      File "C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torch\utils\data\dataloader.py", line 517, in __next__
        data = self._next_data()
      File "C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torch\utils\data\dataloader.py", line 1199, in _next_data
        return self._process_data(data)
      File "C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torch\utils\data\dataloader.py", line 1225, in _process_data
        data.reraise()
      File "C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torch\_utils.py", line 429, in reraise
        raise self.exc_type(msg)
    IndexError: Caught IndexError in DataLoader worker process 0.
    Original Traceback (most recent call last):
      File "C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torch\utils\data\_utils\worker.py", line 202, in _worker_loop
        data = fetcher.fetch(index)
      File "C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "C:\Users\lucas\anaconda3\envs\mdx-submit\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "C:\Users\lucas\Downloads\e2e_music_remastering_system\Self_Supervised_Music_Remastering_System\data_loader\data_loader.py", line 248, in __getitem__
        song_name = cur_aud_path_ori.split('/')[-2]
    IndexError: list index out of range
    
    
    (mdx-submit) C:\Users\lucas\Downloads\e2e_music_remastering_system>
    
    • the input.wav and reference.wav file are already in the 'set_1' folder

    Torch version: 1.8.1+cu111 GPU: RTX 2060 (6 GB)

    What can it be?

    opened by lucasbr15 4
  • Are there any details about NT_Xent?

    Are there any details about NT_Xent?

    I wrote the training code myself, but the model did not converge, the loss is always around 5.2, and the loss is how much the model will converge? Thx.

    opened by 980202006 1
  • AssertionError: make sure checkpoint file for the Mastering Cloner named 'mastering_cloner.pt' is under directory

    AssertionError: make sure checkpoint file for the Mastering Cloner named 'mastering_cloner.pt' is under directory

    Hi, I'm excited to try your tool, but I'm getting the following problem when trying to use inference:

    AssertionError: make sure checkpoint file for the Mastering Cloner named 'mastering_cloner.pt' is under directory

    Using inference as described:

    python inference.py --ckpt_dir "C:\Users\lucas\Downloads\e2e_music_remastering_system\Self_Supervised_Music_Remastering_System\model_checkpoints" --data_dir_test "C:\Users\lucas\Downloads\e2e_music_remastering_system\inference_samples\set_1"

    I already put the 2 checkpoint models in the model_checkpoints folder but I still get this error,

    O que pode ser?

    Screenshot_1 Screenshot_2

    opened by lucasbr15 1
  • Use E2E in Google Colab (no installation required)

    Use E2E in Google Colab (no installation required)

    I also leave here a colab that I created for your tool, in case anyone wants to try it without downloading anything, here is the link:

    https://colab.research.google.com/drive/1QeFdNb-8kftC-HxHqfWnjGZa_99xYW9Y?usp=sharing

    opened by lucasbr15 0
Owner
Junghyun (Tony) Koo
Ph.D. Student @ Music and Audio Research Group (MARG), Seoul National University. Interests - intelligent music production
Junghyun (Tony) Koo
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

?? Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

Rendi Chevi 156 Jan 9, 2023
Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Super-Fast-Adversarial-Training This is a PyTorch Implementation code for develo

LBK 26 Dec 2, 2022
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

null 78 Dec 27, 2022
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Microsoft 282 Jan 9, 2023
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

Jaehyeon Kim 1.7k Jan 8, 2023
Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Sukrut Rao 32 Dec 13, 2022
(CVPR 2022) A minimalistic mapless end-to-end stack for joint perception, prediction, planning and control for self driving.

LAV Learning from All Vehicles Dian Chen, Philipp Krähenbühl CVPR 2022 (also arXiV 2203.11934) This repo contains code for paper Learning from all veh

Dian Chen 300 Dec 15, 2022
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022