PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

DeepSound

Last update: Dec 14, 2022

Related tags

Deep Learning samplernn-pytorch

Overview

samplernn-pytorch

A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.

It's based on the reference implementation in Theano: https://github.com/soroushmehr/sampleRNN_ICLR2017. Unlike the Theano version, our code allows training models with arbitrary number of tiers, whereas the original implementation allows maximum 3 tiers. However it doesn't allow using LSTM units (only GRU). For more details and motivation behind rewriting this model to PyTorch, see our blog post: http://deepsound.io/samplernn_pytorch.html.

Dependencies

This code requires Python 3.5+ and PyTorch 0.1.12+. Installation instructions for PyTorch are available on their website: http://pytorch.org/. You can install the rest of the dependencies by running pip install -r requirements.txt.

Datasets

We provide a script for creating datasets from YouTube single-video mixes. It downloads a mix, converts it to wav and splits it into equal-length chunks. To run it you need youtube-dl (a recent version; the latest version from pip should be okay) and ffmpeg. To create an example dataset - 4 hours of piano music split into 8 second chunks, run:

cd datasets
./download-from-youtube.sh "https://www.youtube.com/watch?v=EhO_MrRfftU" 8 piano

You can also prepare a dataset yourself. It should be a directory in datasets/ filled with equal-length wav files. Or you can create your own dataset format by subclassing torch.utils.data.Dataset. It's easy, take a look at dataset.FolderDataset in this repo for an example.

Training

To train the model you need to run train.py. All model hyperparameters are settable in the command line. Most hyperparameters have sensible default values, so you don't need to provide all of them. Run python train.py -h for details. To train on the piano dataset using the best hyperparameters we've found, run:

python train.py --exp TEST --frame_sizes 16 4 --n_rnn 2 --dataset piano

The results - training log, loss plots, model checkpoints and generated samples will be saved in results/.

We also have an option to monitor the metrics using CometML. To use it, just pass your API key as --comet_key parameter to train.py.

Comments

What does n_frame_samples represent?

Terminology is a bit confusing. What does n_frame_samples mean? Is this the number of samples per frame? or number of frames?

Is the RNN taking in a sequence of frames (ie: a frame per timestep), and the dimension of each frame is the "n_frame_samples"?

opened by williamFalcon 13

Division by Zero when training

  File "samplernn-pytorch/trainer/__init__.py", line 45, in call_plugins
    getattr(plugin, queue_name)(*args)
  File "/usr/local/lib/python3.6/site-packages/torch/utils/trainer/plugins/monitor.py", line 56, in epoch
    stats['epoch_mean'] = epoch_stats[0] / epoch_stats[1]
ZeroDivisionError: division by zero

This is with PyTorch 0.3.0.post4.

duplicate

opened by LukeB42 10

https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496

Is it correct that the code does not implement images2neibs? I think its unfold in pytorch?

This line in the original code: https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496

opened by skaae 7

Convolution dim shuffling

In the FrameLevel forward you guys do:

    def forward(self, prev_samples, upper_tier_conditioning, hidden):
        (batch_size, _, _) = prev_samples.size()

        # (batch, seq_len, dim) -> (batch, dim, seq_len)
        input = prev_samples.permute(0, 2, 1)

        # (batch, dim, seq_len)
        # use conv1d instead of FC for speed
        input = self.input_expand(input)

        # (batch, dim, seq_len) -> (batch, seq_len, dim)
        input = input.permute(0, 2, 1)
        
        # add conditioning tier from previous frame 
        if upper_tier_conditioning is not None:
            input += upper_tier_conditioning
        
        # reset hidden state for TBPTT
        reset = hidden is None
        if hidden is None:
            (n_rnn, _) = self.h0.size()
            hidden = self.h0.unsqueeze(1) \
                            .expand(n_rnn, batch_size, self.dim) \
                            .contiguous()
        
        # -
        (output, hidden) = self.rnn(input, hidden)
        
        # permute again so this can upsample for next context
        output = output.permute(0, 2, 1)
        output = self.upsampling(output)
        output = output.permute(0, 2, 1)
        return (output, hidden)

are the comments I added correct?
I'd like to just use the Linear layer instead of the Conv1d first for understanding purposes. However, the dimensions don't line up when I do it that way. Any thoughts on how to reframe this in terms of a Linear layer?
I assume the transposes you do are so that the convolutions work out? is that standard when using Conv1d instead of Linear layer?

opened by williamFalcon 4

No module naimed trainer

Hello,

when training, I get the following error. The module trainer doesn't seem to exist ??


cd .. && python train.py --exp TEST --frame_sizes 16 4 --n_rnn 2 --dataset piano
Traceback (most recent call last):
  File "train.py", line 11, in <module>
    from trainer.plugins import (
  File "/home/vincent/samplernn-pytorch/trainer/plugins.py", line 8, in <module>
    from torch.utils.trainer.plugins.plugin import Plugin
ModuleNotFoundError: No module named 'torch.utils.trainer'

opened by ghost 2

No matching distribution found for torch==0.2.0.post3

I'm on the pytorch docker, and I'm extremely confused about what I'm doing wrong at this point. Any assistance is appreciated.

root@6b27a1f07b65:~/samplernn-pytorch# pip install -r requirements.txt Collecting librosa==0.5.1 (from -r requirements.txt (line 1)) Using cached librosa-0.5.1.tar.gz Collecting matplotlib==2.1.0 (from -r requirements.txt (line 2)) Using cached matplotlib-2.1.0-cp36-cp36m-manylinux1_x86_64.whl Collecting natsort==5.1.0 (from -r requirements.txt (line 3)) Using cached natsort-5.1.0-py2.py3-none-any.whl Collecting torch==0.2.0.post3 (from -r requirements.txt (line 4)) Could not find a version that satisfies the requirement torch==0.2.0.post3 (from -r requirements.txt (line 4)) (from versions: 0.1.2, 0.1.2.post1) No matching distribution found for torch==0.2.0.post3 (from -r requirements.txt (line 4))

Edit: I can get it working without cuda. At this point is torch==0.2.0.post3 vital?

opened by HandsomeDevilv112 2
quantizer
The quantizer doesn't work with q_levels=512. I'm not sure why but, to me it seems that it should. Maybe the epsilon is too small? For q_levels=512 you get quantized values that are 512 not 511.

I find this variant easier to read and it works :

samples = (q_levels-1)*samples + 0.1 samples = samples.long()
opened by skaae 2
Getting runtime error for sizes of tensors not matching

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 705664 and 58368 in dimension 1 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897

My batch size is 64 instead of the initial 128. It would not run with 128, was getting a divide by 0 error for 128 batch size. I do not know if this could be a factor, but thought it would be worth mentioning.

Has anyone encountered this problem, what fixes are there?

opened by masheendream 1

bug in hidden state?

Hi,

Thanks for the good work! I am wondering should https://github.com/deepsound-project/samplernn-pytorch/blob/master/model.py#L49 be:

h0 = torch.zeros(n_rnn, batch_size, dim)?

I noticed the expand function is used later, but it seems using the expand function the values are shared. Is this a bug?

Many thanks!

Qiuqiang

opened by qiuqiangkong 1

sample generated is noise

Hi all,

I have been training the model using google colab. However, the generated sample is noise all the time. Please give me some suggestion;)

Best regards,

Zixun

opened by guozixunnicolas 1
download dataset from youtube

when I run the code cd datasets ./download-from-youtube.sh "https://www.youtube.com/watch?v=EhO_MrRfftU" 8 piano

"." is not recognized as an internal or external command, operable or batch file shows up.

Anyone can tell me how to deal with this?

opened by guozixunnicolas 1
Which torch version to get?

Hi, some people have reported that putting torch==0.4.1 in the requirements works for them. For me this produces this error:

/content/samplernn-pytorch/model.py:60: UserWarning: nn.init.kaiming_uniform is now deprecated in favor of nn.init.kaiming_uniform_. init.kaiming_uniform(self.input_expand.weight) /content/samplernn-pytorch/model.py:61: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.input_expand.bias, 0) /content/samplernn-pytorch/nn.py:48: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_. nn.init.uniform(tensor, -math.sqrt(3 / fan_in), math.sqrt(3 / fan_in)) /content/samplernn-pytorch/model.py:76: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(getattr(self.rnn, 'bias_ih_l{}'.format(i)), 0) /content/samplernn-pytorch/nn.py:62: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal_. init(chunk) /content/samplernn-pytorch/model.py:82: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(getattr(self.rnn, 'bias_hh_l{}'.format(i)), 0) /content/samplernn-pytorch/nn.py:31: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. nn.init.constant(self.bias, 0) /content/samplernn-pytorch/model.py:90: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_. self.upsampling.conv_t.weight, -np.sqrt(6 / dim), np.sqrt(6 / dim) /content/samplernn-pytorch/model.py:92: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.upsampling.bias, 0) /content/samplernn-pytorch/model.py:141: UserWarning: nn.init.kaiming_uniform is now deprecated in favor of nn.init.kaiming_uniform_. init.kaiming_uniform(self.input.weight) /content/samplernn-pytorch/model.py:150: UserWarning: nn.init.kaiming_uniform is now deprecated in favor of nn.init.kaiming_uniform_. init.kaiming_uniform(self.hidden.weight) /content/samplernn-pytorch/model.py:151: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.hidden.bias, 0) /content/samplernn-pytorch/model.py:161: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.output.bias, 0) Traceback (most recent call last): File "train.py", line 360, in <module> main(**vars(parser.parse_args())) File "train.py", line 258, in main trainer.run(params['epoch_limit']) File "/content/samplernn-pytorch/trainer/__init__.py", line 56, in run self.train() File "/content/samplernn-pytorch/trainer/__init__.py", line 61, in train enumerate(self.dataset, self.iterations + 1): File "/content/samplernn-pytorch/dataset.py", line 51, in __iter__ for batch in super().__iter__(): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 314, in __next__ batch = self.collate_fn([self.dataset[i] for i in indices]) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 314, in <listcomp> batch = self.collate_fn([self.dataset[i] for i in indices]) File "/content/samplernn-pytorch/dataset.py", line 34, in __getitem__ torch.from_numpy(seq), self.q_levels RuntimeError: PyTorch was compiled without NumPy support

I'm using this Colab notebook btw: https://drive.google.com/file/d/13tVz73FXyG8Xvidl-SqyxNtBwozAlvth/view?usp=sharing

Any help would be appreciated!

opened by Flesco 0
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect:

Hi! I have been having a few issues running the code due to older dependencies, the closest I have gotten is running the Colab notebook referenced in the issues comments. The collab notebook but when running the commands on Anaconda I run into this issue: OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: Does anyone know what is wrong?

opened by Bjerken 0
Working training in colab but no sound

Hi, I manage to setup a Colab for trainging. The trainng occurs, at least the first 100 generated samples do not have sound just clicks. Do you know how many epochs should take? Or maybe is something uncompatible with versions o other thing wrong going on

https://colab.research.google.com/drive/1fRhzNtRmdllD74mLzfyCy8SWuMT7sB3m

Best

opened by pabloriera 7
Conflict between PyTorch version and CUDA version

Is it possible to run train.py with CUDA 9+?

train.py attempts to import torch.utils.trainer, which seems to have been removed from PyTorch at around version 1.0.0. However, I think 1.0.0 or newer is required for it to run on GPUs with CUDA 9+.

What's the easiest way to resolve this?

opened by Sinnerboy89 1

Owner

DeepSound

DeepSound is a project of creating a system capable of generating music using deep learning methods, led by two students pursuing their Master's degree.

GitHub

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

?? Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

156 Jan 9, 2023

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

52 Jan 4, 2023

An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`

Text2Event An implementation for Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction Please contact Yaojie Lu (@

153 Jan 7, 2023

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

187 Dec 24, 2022

PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

10 Aug 19, 2022

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

14 Dec 2, 2022

Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

3DETR: An End-to-End Transformer Model for 3D Object Detection PyTorch implementation and models for 3DETR. 3DETR (3D DEtection TRansformer) is a simp

487 Dec 31, 2022

Task-based end-to-end model learning in stochastic optimization

Task-based End-to-end Model Learning in Stochastic Optimization This repository is by Priya L. Donti, Brandon Amos, and J. Zico Kolter and contains th

164 Dec 29, 2022

Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Spatio-Temporal Entropy Model A Pytorch Reproduction of Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression. More details can

16 Nov 28, 2022

An end-to-end image translation model with weight-map for color constancy

CCUnet An end-to-end image translation model with weight-map for color constancy 1. Download the dataset (take Colorchecker_recommended dataset as an

1 Dec 21, 2021

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

onnx-facial-lmk-detector End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model, model.onnx. Demo You can

42 Dec 30, 2022

Neural Dynamic Policies for End-to-End Sensorimotor Learning

This is a PyTorch based implementation for our NeurIPS 2020 paper on Neural Dynamic Policies for end-to-end sensorimotor learning.

47 Dec 11, 2022

[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

NEAT: Neural Attention Fields for End-to-End Autonomous Driving Paper | Supplementary | Video | Poster | Blog This repository is for the ICCV 2021 pap

254 Jan 2, 2023

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

160 Jan 4, 2023

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Project Page | Paper | PyTorch implementation for the paper "AD-NeRF: Audio

551 Dec 29, 2022

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Related tags

Overview

samplernn-pytorch

Dependencies

Datasets

Training

Comments

Owner

DeepSound

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

Task-based end-to-end model learning in stochastic optimization

Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

An end-to-end image translation model with weight-map for color constancy

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

Neural Dynamic Policies for End-to-End Sensorimotor Learning

[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

An end-to-end PyTorch framework for image and video classification

Pytorch library for end-to-end transformer models training and serving