An implementation of WaveNet with fast generation

Overview

pytorch-wavenet

This is an implementation of the WaveNet architecture, as described in the original paper.

Features

  • Automatic creation of a dataset (training and validation/test set) from all sound files (.wav, .aiff, .mp3) in a directory
  • Efficient multithreaded data loading
  • Logging to TensorBoard (Training loss, validation loss, validation accuracy, parameter and gradient histograms, generated samples)
  • Fast generation, as introduced here

Requirements

  • python 3
  • pytorch 0.3
  • numpy
  • librosa
  • jupyter
  • tensorflow for TensorBoard logging

Demo

For an introduction on how to use this model, take a look at the WaveNet demo notebook. You can find audio clips generated by a simple trained model in the generated samples directory

Comments
  • Receptive Field

    Receptive Field

    Hello @vincentherrmann , great work on wavenet on pytorch. I am not able to understand your receptive field calculation. Can you please let me know how that is being calculated. I was thinking receptive field is size = b locks * (2 ^ (layers + 1)).

    Can you please let me know how this is being calculated in your code. And what is the formula that is being using and any kind of justification there might be for using that formula.

    opened by krishnakt031990 4
  • The dilation function doubled the batch size different from the paper

    The dilation function doubled the batch size different from the paper

    after the dilation_func batch_size doubled, is there any effect on filter_conv and gate_conv ? if there is no effect, maybe cut off these useless data could save time

    opened by seaniezhao 2
  • Question about size of receptive field vs input length from dataset

    Question about size of receptive field vs input length from dataset

    Hi @vincentherrmann. Thanks a lot for sharing, learning a great deal through your code! This isn't an issue, only a question about the code.

    Visually, you describe in code the model input (= receptive_field?) and target being as:

               |----receptive_field----|
                                     |--output_length--|
     example:  | | | | | | | | | | | | | | | | | | | | |
     target:                           | | | | | | | | | |
    

    You also said a few days ago in a similar thread:

    The item_length is the number of samples the network gets as input during training and output length is the number of consecutive samples the network outputs. If we were to output only one sample (output length = 1), then there would be item_length=model.receptive_field. But for each additional output sample, we also need an additional input sample, so item_length=model.receptive_field + (output_length-1). Of course during generation we have to set output_length=1.

    However from my observation the target data of output_length=16 shares the same values as the end of input sequence generated by WavenetDataset, apart from the last value. Shouldn't the target sequence be the next sequence of data following the input instead? Or put the opposite way, I don't understand why the target sequence has a output_length-1 data overlap with the end of the input, it should be the future data to be predicted of length output_length. Shouldn't the one_hot input sequence be of length model.receptive_field?

    To keep it visual like in code, I observe the following:

               |---------one hot input by dataset--------|
               |----receptive_field----|
                                         |--target_length--|
     example:  | | | | | | | | | | | | | | | | | | | | | | |
     target:                             | | | | | | | | | |
    

    Any pointers would be greatly appreciated ) If I had to guess I'd say I'm missing something within the training loop, maybe it includes a moving window of size receptive_field to predict one by one the last_value+1 index?

    opened by ironflood 2
  • Generation error in mu_law_expansion function

    Generation error in mu_law_expansion function

    Hi there! I'm running into an error and I'm not sure if it's particular to my setup or something deeper.

    When I run generate_script.py, everything seems to go fine until I get to the final mu_law_expansion step, at which point I get first

    RuntimeWarning: overflow encountered in exp s = np.sign(data) * (np.exp(np.abs(data) * np.log(mu + 1)) - 1) / mu

    and then a fatal

    librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

    It's the np.exp that blows up into enormous numbers.

    The data coming into the mu_law_expansion function is all in the range 0-255…

    [ 45., 202., 198., 194., 115., …

    and then after np.abs(data) * np.log(mu +1) it looks like this…

    [ 249.70842382, 1120.91336915, 1098.71706481, 1076.52076047, 638.14374976 …

    which np.exp blows up:

    [2.79892042e+108, inf, inf, inf, 1.38774344e+277, …

    Looking at the code, I have some suspicions about what might be happening, but before I get too far into fiddling with it, I want to confirm that generate_script.py works on your end.

    Thanks for sharing this codebase and the accompanying documentation—it's super impressive.

    opened by robinsloan 2
  • Dilated Causal Convolution Architecture: Skip, Gated and Residual Convs reference

    Dilated Causal Convolution Architecture: Skip, Gated and Residual Convs reference

    I'm trying to figure out the convolutional architecture of Dilated Causal Convolutions as stated in WaveNet paper.

    I found out that this implementation has this Skip, Gated and Residual convolutions added, but i couldn't find any references about this in the original paper.

    I understand that this kind of convolutions are not originally created by WaveNet, and is also user in many other contexts since 1980, but i'd like to know if WaveNet really used this architecture or it's just an optimization of the author of this repository.

    opened by Vichoko 1
  • why tune n here?isn't n the batch size?

    why tune n here?isn't n the batch size?

    here is a segment of wavenet_modules: l_old = int(round(l / dilation_factor))
    n_old = int(round(n * dilation_factor))
    l = math.ceil(l * init_dilation / dilation)
    n = math.ceil(n * dilation / init_dilation)

    i can't figure out why the batch size changed, i think by this way,the net would be sensitive to the order of the data can someone help with this? thanks

    opened by andylida 1
  • About the dataset that is already provided

    About the dataset that is already provided

    Hello @vincentherrmann Can you give us a little information about the sample dataset.npz file that is present in the repository.

    Wanted to know the sampling rate of the file. How many files were used to create it. (I believe it is just one file since the numpy.load gives me just one file in it). I get the length of the data set as the dataset has 598277 items

    Some detail of that file will really help.

    Thanks a lot in advance.

    opened by krishnakt031990 0
  • Tensors accidentally sharing memory

    Tensors accidentally sharing memory

    Hey, I noticed a potential issue on line 154 in wavenet_model.py

    https://github.com/vincentherrmann/pytorch-wavenet/blob/2b7bfb20e1e6b65dd8bcfbea84095e387eee286c/wavenet_model.py#L154

    I haven't done a deep dive to confirm, but I'm guessing this line is causing the 's' and 'x' tensors to share the same memory.

    I think this would fix it: s = x.clone()

    Hope this helps!

    opened by Ljferrer 0
  • why do I generate something like noise?

    why do I generate something like noise?

    hi,

    image I used your default data and trained the model, the training curve is above picture. I used generate_script.py to generate audio,: image

    I only changed the code like following(no other code changed) image did it affect the generating result?

    Best,

    opened by meadow163 0
  • ConstantPad1d Deprecated

    ConstantPad1d Deprecated

    Hey,

    Can you please help?

    I am facing Runtime error for this function.

    `` class ConstantPad1d(Function): def init(self, target_size, dimension=0, value=0, pad_start=False): super(ConstantPad1d, self).init() self.target_size = target_size self.dimension = dimension self.value = value self.pad_start = pad_start

    def forward(self, input):
        self.num_pad = self.target_size - input.size(self.dimension)
        assert self.num_pad >= 0, 'target size has to be greater than input size'
    
        self.input_size = input.size()
    
        size = list(input.size())
        size[self.dimension] = self.target_size
        output = input.new(*tuple(size)).fill_(self.value)
        c_output = output
    
        # crop output
        if self.pad_start:
            c_output = c_output.narrow(self.dimension, self.num_pad, c_output.size(self.dimension) - self.num_pad)
        else:
            c_output = c_output.narrow(self.dimension, 0, c_output.size(self.dimension) - self.num_pad)
    
        c_output.copy_(input)
        return output
    
    def backward(self, grad_output):
        grad_input = grad_output.new(*self.input_size).zero_()
        cg_output = grad_output
    
        # crop grad_output
        if self.pad_start:
            cg_output = cg_output.narrow(self.dimension, self.num_pad, cg_output.size(self.dimension) - self.num_pad)
        else:
            cg_output = cg_output.narrow(self.dimension, 0, cg_output.size(self.dimension) - self.num_pad)
    
        grad_input.copy_(cg_output)
        return grad_input
    

    `` RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function) Can yo please help?

    opened by jaytimbadia 7
  • Why dilate on batch dimension

    Why dilate on batch dimension

    Currently the batch dimension is forced to be 1 since the dilation is occuring on the 0th dimension of the input tensor. Why is this not being done in a minibatch way?

    opened by btickell 0
  • Meaning of batch size here

    Meaning of batch size here

    I am wondering what exactly the batch size means in the context of training this network? The batch dimension is inflated during dilation when running the training code and the dataset is set to shuffle for the training set. Does this not imply that a series of discontinuous time series data is being concatenated together and treated as a single sample?

    opened by btickell 0
  • ZeroDivisionError: division by zero

    ZeroDivisionError: division by zero

    Traceback (most recent call last): File "train_script.py", line 87, in continue_training_at_step=0) File "/disk2/sushunqi/pytorch-wavenet-master/wavenet_training.py", line 90, in train self.logger.log(step, loss) File "/disk2/sushunqi/pytorch-wavenet-master/model_logging.py", line 35, in log self.validate(current_step) File "/disk2/sushunqi/pytorch-wavenet-master/model_logging.py", line 86, in validate avg_loss, avg_accuracy = self.trainer.validate() File "/disk2/sushunqi/pytorch-wavenet-master/wavenet_training.py", line 110, in validate avg_loss = total_loss / len(self.dataloader) ZeroDivisionError: division by zero

    Thank you very much!!

    opened by Sususuqii 0
  • TypeError: cannot pickle '_io.BufferedReader' object

    TypeError: cannot pickle '_io.BufferedReader' object

    WaveNet_demo.ipynb - trainer.train(batch_size=16,epochs=10):

    Commenting out num_workers in wavenet_training.py did not help, nor did setting num_workers=0. torch 1.6.0, Python 3.

    Thank you for the help!


    TypeError Traceback (most recent call last) in 10 11 print('start training...') ---> 12 trainer.train(batch_size=16,epochs=10)

    ~\Desktop\RUST\AudioSynth\pytorch-wavenet-master\wavenet_training.py in train(self, batch_size, epochs, continue_training_at_step) 62 63 if step % self.snapshot_interval == 0: ---> 64 if self.snapshot_path is None: 65 continue 66 time_string = time.strftime("%Y-%m-%d_%H-%M-%S", time.gmtime())

    ~\AppData\Roaming\Python\Python38\site-packages\torch\utils\data\dataloader.py in iter(self) 289 return _SingleProcessDataLoaderIter(self) 290 else: --> 291 return _MultiProcessingDataLoaderIter(self) 292 293 @property

    ~\AppData\Roaming\Python\Python38\site-packages\torch\utils\data\dataloader.py in init(self, loader) 735 # before it starts, and del tries to join but will get: 736 # AssertionError: can only join a started process. --> 737 w.start() 738 self._index_queues.append(index_queue) 739 self._workers.append(w)

    C:\Program Files\Python3.8\lib\multiprocessing\process.py in start(self) 119 'daemonic processes are not allowed to have children' 120 _cleanup() --> 121 self._popen = self._Popen(self) 122 self._sentinel = self._popen.sentinel 123 # Avoid a refcycle if the target function holds an indirect

    C:\Program Files\Python3.8\lib\multiprocessing\context.py in _Popen(process_obj) 222 @staticmethod 223 def _Popen(process_obj): --> 224 return _default_context.get_context().Process._Popen(process_obj) 225 226 class DefaultContext(BaseContext):

    C:\Program Files\Python3.8\lib\multiprocessing\context.py in _Popen(process_obj) 324 def _Popen(process_obj): 325 from .popen_spawn_win32 import Popen --> 326 return Popen(process_obj) 327 328 class SpawnContext(BaseContext):

    C:\Program Files\Python3.8\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj) 91 try: 92 reduction.dump(prep_data, to_child) ---> 93 reduction.dump(process_obj, to_child) 94 finally: 95 set_spawning_popen(None)

    C:\Program Files\Python3.8\lib\multiprocessing\reduction.py in dump(obj, file, protocol) 58 def dump(obj, file, protocol=None): 59 '''Replacement for pickle.dump() using ForkingPickler.''' ---> 60 ForkingPickler(file, protocol).dump(obj) 61 62 #

    TypeError: cannot pickle '_io.BufferedReader' object

    opened by akhilsadam 4
Owner
Vincent Herrmann
Vincent Herrmann
A fast and easy implementation of Transformer with PyTorch.

FasySeq FasySeq is a shorthand as a Fast and easy sequential modeling toolkit. It aims to provide a seq2seq model to researchers and developers, which

宁羽 7 Jul 18, 2022
A Fast Sequence Transducer Implementation with PyTorch Bindings

transducer A Fast Sequence Transducer Implementation with PyTorch Bindings. The corresponding publication is Sequence Transduction with Recurrent Neur

Awni Hannun 184 Dec 18, 2022
Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

Phil Wang 2.3k Jan 1, 2023
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据,将清华新闻数据、搜狗新闻数据等新闻数据集,以及开源的一些摘要数据进行整理清洗,构建一个较完善的中文摘要数据集。 数据集清洗时,仅进行了简单地规则清洗。

logCong 785 Dec 29, 2022
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 6.4k Jan 1, 2023
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 7, 2023
An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation

Facebook Research 409 Oct 28, 2022
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 211 Dec 28, 2022
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training Code and model from our AAAI 2021 paper

Amazon Web Services - Labs 83 Jan 9, 2023
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 4.8k Feb 18, 2021
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.1k Feb 17, 2021
An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation

Facebook Research 310 Feb 1, 2021
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 137 Feb 1, 2021
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
To be a next-generation DL-based phenotype prediction from genome mutations.

Sequence -----------+--> 3D_structure --> 3D_module --+ +--> ? | |

Eric Alcaide 18 Jan 11, 2022
Python generation script for BitBirds

BitBirds generation script Intro This is published under MIT license, which means you can do whatever you want with it - entirely at your own risk. Pl

null 286 Dec 6, 2022
TTS is a library for advanced Text-to-Speech generation.

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.

Mozilla 6.5k Jan 8, 2023