efficient neural audio synthesis in the waveform domain

Overview

neural waveshaping synthesis

real-time neural audio synthesis in the waveform domain

paperwebsitecolabaudio

by Ben Hayes, Charalampos Saitis, György Fazekas

This repository is the official implementation of Neural Waveshaping Synthesis.

Model Architecture

Requirements

To install:

pip install -r requirements.txt
pip install -e .

We recommend installing in a virtual environment.

Data

We trained our checkpoints on the URMP dataset. Once downloaded, the dataset can be preprocessed using scripts/create_urmp_dataset.py. This will consolidate recordings of each instrument within the dataset and preprocess them according to the pipeline in the paper.

python scripts/create_urmp_dataset.py \
  --gin-file gin/data/urmp_4second_crepe.gin \ 
  --data-directory /path/to/urmp \
  --output-directory /path/to/output \
  --device cuda:0  # torch device string for CREPE model

Alternatively, you can supply your own dataset and use the general create_dataset.py script:

python scripts/create_dataset.py \
  --gin-file gin/data/urmp_4second_crepe.gin \ 
  --data-directory /path/to/dataset \
  --output-directory /path/to/output \
  --device cuda:0  # torch device string for CREPE model

Training

To train a model on the URMP dataset, use this command:

python scripts/train.py \
  --gin-file gin/train/train_newt.gin \
  --dataset-path /path/to/processed/urmp \
  --urmp \
  --instrument vn \  # select URMP instrument with abbreviated string
  --load-data-to-memory

Or to use a non-URMP dataset:

python scripts/train.py \
  --gin-file gin/train/train_newt.gin \
  --dataset-path /path/to/processed/data \
  --load-data-to-memory
Comments
  • colab cpu inference

    colab cpu inference

    great work, is it possible to do inference on cpu in colab? https://colab.research.google.com/github/ben-hayes/neural-waveshaping-synthesis/blob/main/colab/NEWT_Timbre_Transfer.ipynb

    opened by AK391 8
  • Why is input_scale needed in TrainableNonLinearity?

    Why is input_scale needed in TrainableNonLinearity?

    Hi, in class TrainableNonLinearity there is a parameter input_scale with which the input is multiplied. However, the class is only used in the NEWT class and when it is used the input is already scaled by a trainable parameter (the FiLM class). Doesn't that make input_scale redundant?

    Also, I don't think this is mentioned in the paper.

    opened by vvolhejn 2
  • [bug] Training stopped at 1000 epoch

    [bug] Training stopped at 1000 epoch

    Very interesting work! I'm trying to train the model with the URMP dataset for reproducing the results, and I encountered the training stopped at 1000 epoch. I think this is a bug. I found that the default value of max_epochs of pytorch_lightning.Trainer of 1.1.2 version (specified in the requirement txt file) is 1000, which causes the problem (see https://github.com/PyTorchLightning/pytorch-lightning/blob/1.1.2/pytorch_lightning/trainer/trainer.py#L103). Should I use a different version of pytorch-lightning ? or should I set max_epochs=None ?

    bug question 
    opened by TomohikoNakamura 2
  • Off-by-one error in index computation in FastNEWT

    Off-by-one error in index computation in FastNEWT

    Hi, in FastNEWT, the continuous array index is computed as (code):

    idx = self.table_size * (x - self.table_min) / (self.table_max - self.table_min)
    

    I believe the correct version would be

    idx = (self.table_size - 1) * (x - self.table_min) / (self.table_max - self.table_min)
    

    For instance, imagine table_min=0, table_max=1, table_size=2. Then if x == 0.5, we would get idx == 1.0 in the current version, whereas it should be idx == 0.5 (in general, idx == x).

    Edit: Additionally, I think the behavior on the left and right sides of the lookup table is different. On the left, there is linear extrapolation, on the right, a constant. I'm running my reimplementation rather than the original code but the computation should be the same.

    Screenshot 2022-04-04 at 15 59 18

    The reason is that lower is first clipped and then upper is computed as clip(lower + 1). This means that for values lower than min_value, we have lower == 0 and upper == 1 whereas for values larger than max_value, we have lower == upper == table_size - 1.

    opened by vvolhejn 0
  • Will this work with a newer version of PyTorch than 1.7.1?

    Will this work with a newer version of PyTorch than 1.7.1?

    I'm trying to install Neural Waveshaping Synthesis on an NVIDIA Jetson (arm64 with NVIDIA GPU). pip install -r requirements.txt can't seem to find torch==1.7.1. NVIDIA have pre-built wheels for newer versions of PyTorch including 1.10.0. Will a newer version than 1.7.1 work?

    Here's what's available:

    https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048

    opened by znmeb 4
  • Add Cog configuration

    Add Cog configuration

    Hi @ben-hayes!

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model! View it here: https://replicate.ai/ben-hayes/neural-waveshaping-synthesis

    If you click the "Sign in with GitHub" button you can claim your model so you can edit the description, add examples, etc. We'll also feature it on the Explore page.

    image

    opened by andreasjansson 0
Owner
Ben Hayes
AI & Music PhD researcher @ Centre for Digital Music, QMUL
Ben Hayes
The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

WSRGlow The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio sa

Kexun Zhang 96 Jan 3, 2023
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. Training python train.py --c

Rishikesh (ऋषिकेश) 55 Dec 26, 2022
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 170 Jan 4, 2023
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 54 Aug 30, 2021
This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Project Page | Paper | PyTorch implementation for the paper "AD-NeRF: Audio

null 551 Dec 29, 2022
Efficient neural networks for analog audio effect modeling

micro-TCN Efficient neural networks for audio effect modeling

Christian Steinmetz 94 Dec 29, 2022
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Fre-GAN Vocoder Fre-GAN: Adversarial Frequency-consistent Audio Synthesis Training: python train.py --config config.json Citation: @misc{kim2021frega

Rishikesh (ऋषिकेश) 93 Dec 17, 2022
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

NTT Communication Science Laboratories 160 Jan 4, 2023
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Rishikesh (ऋषिकेश) 31 Dec 8, 2022
Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Kento Nishi 22 Jul 7, 2022
Efficient Training of Audio Transformers with Patchout

PaSST: Efficient Training of Audio Transformers with Patchout This is the implementation for Efficient Training of Audio Transformers with Patchout Pa

null 165 Dec 26, 2022
Code for CVPR2021 "Visualizing Adapted Knowledge in Domain Transfer". Visualization for domain adaptation. #explainable-ai

Visualizing Adapted Knowledge in Domain Transfer @inproceedings{hou2021visualizing, title={Visualizing Adapted Knowledge in Domain Transfer}, auth

Yunzhong Hou 80 Dec 25, 2022
[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation [Paper] Prerequisites To install requirements: pip install -r requirements.txt

Guangrui Li 84 Dec 26, 2022
Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization This is an official implementation of "Feature Stylization and Domain-

null 22 Sep 22, 2022
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

null 19 Oct 14, 2022
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation [arxiv] This is the official repository for CDTrans: Cross-domain Transformer for

null 238 Dec 22, 2022
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

[ICCV2021] TransReID: Transformer-based Object Re-Identification [pdf] The official repository for TransReID: Transformer-based Object Re-Identificati

DamoCV 569 Dec 30, 2022
Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

DSBF Introduction This repository contains the implementation code for paper: Domain-Specific Bias Filtering for Single Labeled Domain Generalization

ScottYuan 7 Jan 5, 2023
A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Source data‐free domain adaptation of object detector through domain‐specific perturbation Please follow Faster R-CNN and

null 1 Dec 25, 2021