PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Overview

samplernn-pytorch

A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.

A visual representation of the SampleRNN architecture

It's based on the reference implementation in Theano: https://github.com/soroushmehr/sampleRNN_ICLR2017. Unlike the Theano version, our code allows training models with arbitrary number of tiers, whereas the original implementation allows maximum 3 tiers. However it doesn't allow using LSTM units (only GRU). For more details and motivation behind rewriting this model to PyTorch, see our blog post: http://deepsound.io/samplernn_pytorch.html.

Dependencies

This code requires Python 3.5+ and PyTorch 0.1.12+. Installation instructions for PyTorch are available on their website: http://pytorch.org/. You can install the rest of the dependencies by running pip install -r requirements.txt.

Datasets

We provide a script for creating datasets from YouTube single-video mixes. It downloads a mix, converts it to wav and splits it into equal-length chunks. To run it you need youtube-dl (a recent version; the latest version from pip should be okay) and ffmpeg. To create an example dataset - 4 hours of piano music split into 8 second chunks, run:

cd datasets
./download-from-youtube.sh "https://www.youtube.com/watch?v=EhO_MrRfftU" 8 piano

You can also prepare a dataset yourself. It should be a directory in datasets/ filled with equal-length wav files. Or you can create your own dataset format by subclassing torch.utils.data.Dataset. It's easy, take a look at dataset.FolderDataset in this repo for an example.

Training

To train the model you need to run train.py. All model hyperparameters are settable in the command line. Most hyperparameters have sensible default values, so you don't need to provide all of them. Run python train.py -h for details. To train on the piano dataset using the best hyperparameters we've found, run:

python train.py --exp TEST --frame_sizes 16 4 --n_rnn 2 --dataset piano

The results - training log, loss plots, model checkpoints and generated samples will be saved in results/.

We also have an option to monitor the metrics using CometML. To use it, just pass your API key as --comet_key parameter to train.py.

Comments
  • What does n_frame_samples represent?

    What does n_frame_samples represent?

    Terminology is a bit confusing. What does n_frame_samples mean? Is this the number of samples per frame? or number of frames?

    Is the RNN taking in a sequence of frames (ie: a frame per timestep), and the dimension of each frame is the "n_frame_samples"?

    opened by williamFalcon 13
  • Division by Zero when training

    Division by Zero when training

      File "samplernn-pytorch/trainer/__init__.py", line 45, in call_plugins
        getattr(plugin, queue_name)(*args)
      File "/usr/local/lib/python3.6/site-packages/torch/utils/trainer/plugins/monitor.py", line 56, in epoch
        stats['epoch_mean'] = epoch_stats[0] / epoch_stats[1]
    ZeroDivisionError: division by zero
    

    This is with PyTorch 0.3.0.post4.

    duplicate 
    opened by LukeB42 10
  • https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496

    https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496

    Is it correct that the code does not implement images2neibs? I think its unfold in pytorch?

    This line in the original code: https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496

    opened by skaae 7
  • Convolution dim shuffling

    Convolution dim shuffling

    In the FrameLevel forward you guys do:

        def forward(self, prev_samples, upper_tier_conditioning, hidden):
            (batch_size, _, _) = prev_samples.size()
    
            # (batch, seq_len, dim) -> (batch, dim, seq_len)
            input = prev_samples.permute(0, 2, 1)
    
            # (batch, dim, seq_len)
            # use conv1d instead of FC for speed
            input = self.input_expand(input)
    
            # (batch, dim, seq_len) -> (batch, seq_len, dim)
            input = input.permute(0, 2, 1)
            
            # add conditioning tier from previous frame 
            if upper_tier_conditioning is not None:
                input += upper_tier_conditioning
            
            # reset hidden state for TBPTT
            reset = hidden is None
            if hidden is None:
                (n_rnn, _) = self.h0.size()
                hidden = self.h0.unsqueeze(1) \
                                .expand(n_rnn, batch_size, self.dim) \
                                .contiguous()
            
            # -
            (output, hidden) = self.rnn(input, hidden)
            
            # permute again so this can upsample for next context
            output = output.permute(0, 2, 1)
            output = self.upsampling(output)
            output = output.permute(0, 2, 1)
            return (output, hidden)
    
    1. are the comments I added correct?

    2. I'd like to just use the Linear layer instead of the Conv1d first for understanding purposes. However, the dimensions don't line up when I do it that way. Any thoughts on how to reframe this in terms of a Linear layer?

    3. I assume the transposes you do are so that the convolutions work out? is that standard when using Conv1d instead of Linear layer?

    opened by williamFalcon 4
  • No module naimed trainer

    No module naimed trainer

    Hello,

    when training, I get the following error. The module trainer doesn't seem to exist ??

    
    cd .. && python train.py --exp TEST --frame_sizes 16 4 --n_rnn 2 --dataset piano
    Traceback (most recent call last):
      File "train.py", line 11, in <module>
        from trainer.plugins import (
      File "/home/vincent/samplernn-pytorch/trainer/plugins.py", line 8, in <module>
        from torch.utils.trainer.plugins.plugin import Plugin
    ModuleNotFoundError: No module named 'torch.utils.trainer'
    
    opened by ghost 2
  • No matching distribution found for torch==0.2.0.post3

    No matching distribution found for torch==0.2.0.post3

    I'm on the pytorch docker, and I'm extremely confused about what I'm doing wrong at this point. Any assistance is appreciated.

    root@6b27a1f07b65:~/samplernn-pytorch# pip install -r requirements.txt Collecting librosa==0.5.1 (from -r requirements.txt (line 1)) Using cached librosa-0.5.1.tar.gz Collecting matplotlib==2.1.0 (from -r requirements.txt (line 2)) Using cached matplotlib-2.1.0-cp36-cp36m-manylinux1_x86_64.whl Collecting natsort==5.1.0 (from -r requirements.txt (line 3)) Using cached natsort-5.1.0-py2.py3-none-any.whl Collecting torch==0.2.0.post3 (from -r requirements.txt (line 4)) Could not find a version that satisfies the requirement torch==0.2.0.post3 (from -r requirements.txt (line 4)) (from versions: 0.1.2, 0.1.2.post1) No matching distribution found for torch==0.2.0.post3 (from -r requirements.txt (line 4))

    Edit: I can get it working without cuda. At this point is torch==0.2.0.post3 vital?

    opened by HandsomeDevilv112 2
  • quantizer

    quantizer

    The quantizer doesn't work with q_levels=512. I'm not sure why but, to me it seems that it should. Maybe the epsilon is too small? For q_levels=512 you get quantized values that are 512 not 511.

    I find this variant easier to read and it works :

        samples = (q_levels-1)*samples + 0.1
        samples = samples.long()
    
    opened by skaae 2
  • Getting runtime error for sizes of tensors not matching

    Getting runtime error for sizes of tensors not matching

    RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 705664 and 58368 in dimension 1 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897

    My batch size is 64 instead of the initial 128. It would not run with 128, was getting a divide by 0 error for 128 batch size. I do not know if this could be a factor, but thought it would be worth mentioning.

    Has anyone encountered this problem, what fixes are there?

    opened by masheendream 1
  • bug in hidden state?

    bug in hidden state?

    Hi,

    Thanks for the good work! I am wondering should https://github.com/deepsound-project/samplernn-pytorch/blob/master/model.py#L49 be:
    
    h0 = torch.zeros(n_rnn, batch_size, dim)?
    
    I noticed the expand function is used later, but it seems using the expand function the values are shared. Is this a bug? 
    

    Many thanks!

    Qiuqiang

    opened by qiuqiangkong 1
  • sample generated is noise

    sample generated is noise

    Hi all,

    I have been training the model using google colab. However, the generated sample is noise all the time. Please give me some suggestion;)

    Best regards,

    Zixun

    opened by guozixunnicolas 1
  • download dataset from youtube

    download dataset from youtube

    when I run the code cd datasets ./download-from-youtube.sh "https://www.youtube.com/watch?v=EhO_MrRfftU" 8 piano

    "." is not recognized as an internal or external command, operable or batch file shows up.

    Anyone can tell me how to deal with this?

    opened by guozixunnicolas 1
  • Which torch version to get?

    Which torch version to get?

    Hi, some people have reported that putting torch==0.4.1 in the requirements works for them. For me this produces this error:

    /content/samplernn-pytorch/model.py:60: UserWarning: nn.init.kaiming_uniform is now deprecated in favor of nn.init.kaiming_uniform_. init.kaiming_uniform(self.input_expand.weight) /content/samplernn-pytorch/model.py:61: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.input_expand.bias, 0) /content/samplernn-pytorch/nn.py:48: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_. nn.init.uniform(tensor, -math.sqrt(3 / fan_in), math.sqrt(3 / fan_in)) /content/samplernn-pytorch/model.py:76: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(getattr(self.rnn, 'bias_ih_l{}'.format(i)), 0) /content/samplernn-pytorch/nn.py:62: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal_. init(chunk) /content/samplernn-pytorch/model.py:82: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(getattr(self.rnn, 'bias_hh_l{}'.format(i)), 0) /content/samplernn-pytorch/nn.py:31: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. nn.init.constant(self.bias, 0) /content/samplernn-pytorch/model.py:90: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_. self.upsampling.conv_t.weight, -np.sqrt(6 / dim), np.sqrt(6 / dim) /content/samplernn-pytorch/model.py:92: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.upsampling.bias, 0) /content/samplernn-pytorch/model.py:141: UserWarning: nn.init.kaiming_uniform is now deprecated in favor of nn.init.kaiming_uniform_. init.kaiming_uniform(self.input.weight) /content/samplernn-pytorch/model.py:150: UserWarning: nn.init.kaiming_uniform is now deprecated in favor of nn.init.kaiming_uniform_. init.kaiming_uniform(self.hidden.weight) /content/samplernn-pytorch/model.py:151: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.hidden.bias, 0) /content/samplernn-pytorch/model.py:161: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.output.bias, 0) Traceback (most recent call last): File "train.py", line 360, in <module> main(**vars(parser.parse_args())) File "train.py", line 258, in main trainer.run(params['epoch_limit']) File "/content/samplernn-pytorch/trainer/__init__.py", line 56, in run self.train() File "/content/samplernn-pytorch/trainer/__init__.py", line 61, in train enumerate(self.dataset, self.iterations + 1): File "/content/samplernn-pytorch/dataset.py", line 51, in __iter__ for batch in super().__iter__(): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 314, in __next__ batch = self.collate_fn([self.dataset[i] for i in indices]) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 314, in <listcomp> batch = self.collate_fn([self.dataset[i] for i in indices]) File "/content/samplernn-pytorch/dataset.py", line 34, in __getitem__ torch.from_numpy(seq), self.q_levels RuntimeError: PyTorch was compiled without NumPy support

    I'm using this Colab notebook btw: https://drive.google.com/file/d/13tVz73FXyG8Xvidl-SqyxNtBwozAlvth/view?usp=sharing

    Any help would be appreciated!

    opened by Flesco 0
  • OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect:

    OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect:

    Hi! I have been having a few issues running the code due to older dependencies, the closest I have gotten is running the Colab notebook referenced in the issues comments. The collab notebook but when running the commands on Anaconda I run into this issue: OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: Does anyone know what is wrong?

    opened by Bjerken 0
  • Working training in colab but no sound

    Working training in colab but no sound

    Hi, I manage to setup a Colab for trainging. The trainng occurs, at least the first 100 generated samples do not have sound just clicks. Do you know how many epochs should take? Or maybe is something uncompatible with versions o other thing wrong going on

    https://colab.research.google.com/drive/1fRhzNtRmdllD74mLzfyCy8SWuMT7sB3m

    Best

    opened by pabloriera 7
  • Conflict between PyTorch version and CUDA version

    Conflict between PyTorch version and CUDA version

    Is it possible to run train.py with CUDA 9+?

    train.py attempts to import torch.utils.trainer, which seems to have been removed from PyTorch at around version 1.0.0. However, I think 1.0.0 or newer is required for it to run on GPUs with CUDA 9+.

    What's the easiest way to resolve this?

    opened by Sinnerboy89 1
Owner
DeepSound
DeepSound is a project of creating a system capable of generating music using deep learning methods, led by two students pursuing their Master's degree.
DeepSound
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

?? Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

Rendi Chevi 156 Jan 9, 2023
PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

Liyan 52 Jan 4, 2023
An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`

Text2Event An implementation for Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction Please contact Yaojie Lu (@

Roger 153 Jan 7, 2023
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

null 10 Aug 19, 2022
A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

張致強 14 Dec 2, 2022
Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

3DETR: An End-to-End Transformer Model for 3D Object Detection PyTorch implementation and models for 3DETR. 3DETR (3D DEtection TRansformer) is a simp

Facebook Research 487 Dec 31, 2022
Task-based end-to-end model learning in stochastic optimization

Task-based End-to-end Model Learning in Stochastic Optimization This repository is by Priya L. Donti, Brandon Amos, and J. Zico Kolter and contains th

CMU Locus Lab 164 Dec 29, 2022
Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Spatio-Temporal Entropy Model A Pytorch Reproduction of Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression. More details can

null 16 Nov 28, 2022
An end-to-end image translation model with weight-map for color constancy

CCUnet An end-to-end image translation model with weight-map for color constancy 1. Download the dataset (take Colorchecker_recommended dataset as an

Jianhui Qiu 1 Dec 21, 2021
End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

onnx-facial-lmk-detector End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model, model.onnx. Demo You can

atksh 42 Dec 30, 2022
Neural Dynamic Policies for End-to-End Sensorimotor Learning

This is a PyTorch based implementation for our NeurIPS 2020 paper on Neural Dynamic Policies for end-to-end sensorimotor learning.

Shikhar Bahl 47 Dec 11, 2022
[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

NEAT: Neural Attention Fields for End-to-End Autonomous Driving Paper | Supplementary | Video | Poster | Blog This repository is for the ICCV 2021 pap

null 254 Jan 2, 2023
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

NTT Communication Science Laboratories 160 Jan 4, 2023
This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Project Page | Paper | PyTorch implementation for the paper "AD-NeRF: Audio

null 551 Dec 29, 2022
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling For Official repo of NU-Wave: A Diffusion Probabilistic Model for Neural Audio Up

Rishikesh (ऋषिकेश) 38 Oct 11, 2022
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

NU-Wave — Official PyTorch Implementation NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling Junhyeok Lee, Seungu Han @ MINDsLab Inc

MINDs Lab 242 Dec 23, 2022
An end-to-end PyTorch framework for image and video classification

What's New: March 2021: Added RegNetZ models November 2020: Vision Transformers now available, with training recipes! 2020-11-20: Classy Vision v0.5 R

Facebook Research 1.5k Dec 31, 2022
Pytorch library for end-to-end transformer models training and serving

Pytorch library for end-to-end transformer models training and serving

Mikhail Grankin 768 Jan 1, 2023