Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Overview

rave_logo

RAVE: Realtime Audio Variational autoEncoder

Official implementation of RAVE: A variational autoencoder for fast and high-quality neural audio synthesis (article link)

Installation

RAVE needs python 3.9. Install the dependencies using

pip install -r requirements.txt

Training

Both RAVE and the prior model are available in this repo. For most users we recommand to use the cli_helper.py script, since it will generate a set of instructions allowing the training and export of both RAVE and the prior model on a specific dataset.

python cli_helper.py

However, if you want to customize even more your training, you can use the provided train_{rave, prior}.py and export_{rave, prior}.py scripts manually.

Realtime usage

[NOT AVAILABLE YET]

RAVE and the prior model can be used in realtime inside max/msp, allowing creative interactions with both models. Code and details about this part of the project are not available yet, we are currently working on the corresponding article !

max_msp_screenshot

An audio example of the prior sampling patch is available in the docs/ folder.

Comments
  • Error when training prior

    Error when training prior

    Hi,

    I'm experiencing troubles with training the prior, similar error as described in closed issue #33 . I'm reopening the issue, as I dont think it was completely solved.

    Basically get the same Runtime error, as described here. Training of RAVE was done with custom parameters, but can't seem to access them as there was no instructions_*.txtgenerated when working with the cli_helper.py

    RuntimeError: cannot reshape tensor of 0 elements into shape [8, 0, 128, -1] because the unspecified dimension size -1 can be any value and is ambiguous

    Thanks!

    opened by moiseshorta 23
  • Error trying to launch train_rave.py

    Error trying to launch train_rave.py

    Hi, I was trying to launch train_rave.py with a dataset for testing. I am using 310 .wav files with the cli_helper.py, which returned me an error like this:

    $ python train_rave.py --name training1 --wav ./dataset/1 --preprocessed /tmp/rave/training1/rave ... 5_38_26.wav: 99%|███████████████████████████▊| 308/310 [00:46<00:00, 6.69it/s]/home/pablo/.local/lib/python3.9/site-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead. warnings.warn("PySoundFile failed. Trying audioread instead.")

    2_38_13.wav: 100%|███████████████████████████▉| 309/310 [00:46<00:00, 6.60it/s]/home/pablo/.local/lib/python3.9/site-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead. warnings.warn("PySoundFile failed. Trying audioread instead.")

    2_38_13.wav: 100%|████████████████████████████| 310/310 [00:46<00:00, 6.61it/s] Traceback (most recent call last): File "/home/pablo/RAVE/train_rave.py", line 77, in dataset = SimpleDataset( File "/home/pablo/.local/lib/python3.9/site-packages/udls/simple_dataset.py", line 83, in init raise Exception("No data found !") Exception: No data found !

    Have you seen this error before? I am trying to use it with a GPU, but launching only with CPU throws the same error. Maybe there is a problem with the dataset?

    Thanks!

    opened by pgm-n117 15
  • Kernel size error when rave.decode(z)

    Kernel size error when rave.decode(z)

    i get the following error when i try generating from the prior

    Traceback (most recent call last): File "/home/syrinx/RAVE/ravezeke1-generate.py", line 43, in <module> y = rave.decode(z) File "/home/syrinx/RAVE/rave/model.py", line 582, in decode y = self.decoder(z, add_noise=True) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/syrinx/RAVE/rave/model.py", line 235, in forward x = self.net(x) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl result = forward_call(*input, **kwargs) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/cached_conv/convs.py", line 74, in forward return nn.functional.conv1d( RuntimeError: Calculated padded input size per channel: (6). Kernel size: (7). Kernel size can't be greater than actual input size

    this is the code

    `################ PRIOR GENERATION ################

    STEP 1: CREATE DUMMY INPUT TENSOR

    generation_length = 2**18 # approximately 6s at 48kHz x = torch.randn(1, 1, generation_length) # dummy input z = rave.encode(x) # dummy latent representation z = torch.zeros_like(z)

    STEP 2: AUTOREGRESSIVE GENERATION

    z = prior.quantized_normal.encode(prior.diagonal_shift(z)) z = prior.generate(z) z = prior.diagonal_shift.inverse(prior.quantized_normal.decode(z))

    STEP 3: SYNTHESIS AND EXPORT

    y = rave.decode(z) sf.write("output_audio.wav", y.reshape(-1).numpy(), sr) `

    when i change generation_length to a smaller size, i get the error

    RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

    my audio is in 44.1kHz

    opened by robinmeier 15
  • Fixed loss during prior training

    Fixed loss during prior training

    Hello,

    It seems that i'm missing something in the training procedure. I trained a RAVE model for about 650K steps (after which it seemed to plateau).

    Then exported it and started training the prior. Weirdly I am already at 500K steps and the loss is not decreasing, it seems to be stuck between 3,19 and 3,12.

    When i try to get audio sample in Tensorboard, the prior model outputs noise whereas the rave one is doing ok.

    I am not sure what i'm doing wrong, I read some people talking about phase 2 kicking in, but im also confused wether this is the second step (the GAN training) or something happening within the first step.

    If someone could shed some light on these questions that would be super helpful. Thanks!

    opened by chebmarcel 10
  • Training tip for complex datasets

    Training tip for complex datasets

    Hi,

    Just wanted to share some insights into training RAVE with complex datasets (full songs, heterogeneous sounds).

    One solution I've come across having a good convergence on models is extending the latent size parameter to a full 128. Also, which is what I think had a greater impact, is to extend the Phase 1 training or warmup to about 5 million steps before switching to Phase 2.

    Obviously, more data will have better results, but so far I've found that the more 'detailed' features in my dataset, namely melodies, textures of individual mid-to-high freq. range instruments seem to start converging better beyond the suggested warmup phase of 1 million steps, as the loss keeps consistently going down.

    Hope this helps anyone having trouble with this.

    Good luck!

    opened by moiseshorta 7
  • Parallel Training

    Parallel Training

    Hey again,

    thanks again for this awesome library. I have been playing around with it for about a month now and I feel like I am finally starting to get the hang of it haha.

    I noticed you have a parallel_traning.sh script. This seems to be for training multiple models in parallel, is that correct?

    I am training these model on my own hardware (which can take some time) and would love to make use of both of the GPUs I have. I tried to modify the script a few weeks ago but ran into errors with python_lightning, specifically with a data class. Is this something you plan on supporting?

    If not, I would love to take a crack at it and make a PR.

    What are your thoughts?

    opened by iamzoltan 7
  • train_rave.py does not recognise GPU

    train_rave.py does not recognise GPU

    Hey there,

    it seems that the train_rave.py script (more specifically pytorch lightning) can't find my system's GPU.

    A few machine specs:

    • Win 10 Pro
    • RTX 3060

    Steps:

    1. Downloaded this repository
    2. Created a new env using miniconda
    3. Installed pip, then installed all RAVE dependencies
    4. Ran cli_helper.py
    5. Tried to call train_rave.py as per cli_helper's output - I'm getting the following error: pytorch_lightning.utilities.exceptions.MisconfigurationException: GPUAccelerator can not run on your system since the accelerator is not available. The following accelerator(s) is available and can be passed into `accelerator` argument of `Trainer`: ['cpu'].

    Any ideas?

    Thanks in advance.

    opened by iorhythm 6
  • Error when trying to train the prior

    Error when trying to train the prior

    Hey again,

    I just finished training and exporting a new model, but I cant seem to get it to train the prior. I am getting the following error when exporting the model:

    /home/user/code/RAVE/env3.9/lib/python3.9/site-packages/pytorch_lightning/core/saving.py:217: UserWarning: Found keys that are not in the model state dict but in the checkpoint: ['decoder.net.2.net.0.aligned.paddings.0.pad', 'decoder.net.2.net.0.aligned.paddings.1.pad', 'decoder.net.2.net.1.aligned.paddings.0.pad', 'decoder.net.2.net.1.aligned.paddings.1.pad', 'decoder.net.2.net.2.aligned.paddings.0.pad', 'decoder.net.2.net.2.aligned.paddings.1.pad', 'decoder.net.4.net.0.aligned.paddings.0.pad', 'decoder.net.4.net.0.aligned.paddings.1.pad', 'decoder.net.4.net.1.aligned.paddings.0.pad', 'decoder.net.4.net.1.aligned.paddings.1.pad', 'decoder.net.4.net.2.aligned.paddings.0.pad', 'decoder.net.4.net.2.aligned.paddings.1.pad', 'decoder.net.6.net.0.aligned.paddings.0.pad', 'decoder.net.6.net.0.aligned.paddings.1.pad', 'decoder.net.6.net.1.aligned.paddings.0.pad', 'decoder.net.6.net.1.aligned.paddings.1.pad', 'decoder.net.6.net.2.aligned.paddings.0.pad', 'decoder.net.6.net.2.aligned.paddings.1.pad', 'decoder.net.8.net.0.aligned.paddings.0.pad', 'decoder.net.8.net.0.aligned.paddings.1.pad', 'decoder.net.8.net.1.aligned.paddings.0.pad', 'decoder.net.8.net.1.aligned.paddings.1.pad', 'decoder.net.8.net.2.aligned.paddings.0.pad', 'decoder.net.8.net.2.aligned.paddings.1.pad', 'decoder.synth.paddings.0.pad', 'decoder.synth.paddings.1.pad', 'decoder.synth.paddings.2.pad'] rank_zero_warn(

    any ides?

    opened by iamzoltan 6
  • Model Size Mismatch

    Model Size Mismatch

    not sure what going on here, but after I pulled the latest master, I am getting this error:

    RuntimeError: Error(s) in loading state_dict for RAVE:
    	size mismatch for decoder.net.2.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.2.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.2.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.2.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.2.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.2.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
    	size mismatch for decoder.net.4.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.4.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.4.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.4.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.4.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.4.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
    	size mismatch for decoder.net.6.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.6.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.6.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.6.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.6.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.6.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
    	size mismatch for decoder.net.8.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.net.8.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.net.8.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.net.8.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.net.8.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.net.8.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.synth.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.synth.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    	size mismatch for decoder.synth.paddings.2.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
    

    any insights would be welcomed!

    opened by iamzoltan 6
  • Can't load pretrained state

    Can't load pretrained state

    Hey, I used rave.cpkt in the example "loading pertained models" from the readme and got the following error:

    Sincerely sorry for the painfully nooby issue

    code

    `import torch

    torch.set_grad_enabled(False) from rave import RAVE from prior import Prior

    import librosa as li import soundfile as sf

    ################ LOADING PRETRAINED MODELS ################ rave = RAVE.load_from_checkpoint("./rave_pretrained/darbouka/rave.ckpt").eval()`

    error

    Traceback (most recent call last): File "/Users/krayyy/PycharmProjects/pythonProject11/rave39test.py", line 11, in <module> rave = RAVE.load_from_checkpoint("./rave_pretrained/darbouka/rave.ckpt").eval() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pytorch_lightning/core/saving.py", line 153, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pytorch_lightning/core/saving.py", line 201, in _load_model_state keys = model.load_state_dict(checkpoint["state_dict"], strict=strict) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/torch/nn/modules/module.py", line 1482, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for RAVE: Unexpected key(s) in state_dict: "decoder.net.2.net.0.aligned.paddings.0.pad", "decoder.net.2.net.0.aligned.paddings.1.pad", "decoder.net.2.net.1.aligned.paddings.0.pad", "decoder.net.2.net.1.aligned.paddings.1.pad", "decoder.net.2.net.2.aligned.paddings.0.pad", "decoder.net.2.net.2.aligned.paddings.1.pad", "decoder.net.4.net.0.aligned.paddings.0.pad", "decoder.net.4.net.0.aligned.paddings.1.pad", "decoder.net.4.net.1.aligned.paddings.0.pad", "decoder.net.4.net.1.aligned.paddings.1.pad", "decoder.net.4.net.2.aligned.paddings.0.pad", "decoder.net.4.net.2.aligned.paddings.1.pad", "decoder.net.6.net.0.aligned.paddings.0.pad", "decoder.net.6.net.0.aligned.paddings.1.pad", "decoder.net.6.net.1.aligned.paddings.0.pad", "decoder.net.6.net.1.aligned.paddings.1.pad", "decoder.net.6.net.2.aligned.paddings.0.pad", "decoder.net.6.net.2.aligned.paddings.1.pad", "decoder.net.8.net.0.aligned.paddings.0.pad", "decoder.net.8.net.0.aligned.paddings.1.pad", "decoder.net.8.net.1.aligned.paddings.0.pad", "decoder.net.8.net.1.aligned.paddings.1.pad", "decoder.net.8.net.2.aligned.paddings.0.pad", "decoder.net.8.net.2.aligned.paddings.1.pad", "decoder.synth.paddings.0.pad", "decoder.synth.paddings.1.pad", "decoder.synth.paddings.2.pad".

    opened by xxrraa 6
  • Profocol buffers error TypeError: Descriptors cannot not be created directly.

    Profocol buffers error TypeError: Descriptors cannot not be created directly.

    After a fresh install of miniconda and updating conda itself (Linux, Ubuntu) I'm getting the following error when trying to train a RAVE model.

    Prepending PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python to the training command gets it running. But is this expected behavior? Seems to be some kind of compatibility issue with the latest version of python...?

    $ CUDA_VISIBLE_DEVICES=1 python train_rave.py --name blonk_nolatency_1 --wav ~/datasets/unconditional/Blonk/Vocalor/ --preprocessed ../jobs/rave/blonk_nolatency_1/tmp --sr 48000 --data-size 16 --no-latency true                                     
    
    Traceback (most recent call last):
      File "/its/home/user/RAVE/train_rave.py", line 4, in <module>
        from rave.model import RAVE
      File "/its/home/user/RAVE/rave/__init__.py", line 1, in <module>
        from .model import RAVE
      File "/its/home/user/RAVE/rave/model.py", line 5, in <module>
        import pytorch_lightning as pl
      File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/__init__.py", line 30, in <module>
        from pytorch_lightning.callbacks import Callback  # noqa: E402
      File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/callbacks/__init__.py", line 26, in <module>
        from pytorch_lightning.callbacks.pruning import ModelPruning
      File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/callbacks/pruning.py", line 31, in <module>
        from pytorch_lightning.core.lightning import LightningModule
      File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/core/__init__.py", line 16, in <module>
        from pytorch_lightning.core.lightning import LightningModule
      File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 40, in <module>
        from pytorch_lightning.loggers import LightningLoggerBase
      File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/loggers/__init__.py", line 18, in <module>
        from pytorch_lightning.loggers.tensorboard import TensorBoardLogger
      File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/loggers/tensorboard.py", line 26, in <module>
        from torch.utils.tensorboard import SummaryWriter
      File "/its/home/user/.local/lib/python3.9/site-packages/torch/utils/tensorboard/__init__.py", line 10, in <module>
        from .writer import FileWriter, SummaryWriter  # noqa: F401
      File "/its/home/user/.local/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 9, in <module>
        from tensorboard.compat.proto.event_pb2 import SessionLog
      File "/its/home/user/.local/lib/python3.9/site-packages/tensorboard/compat/proto/event_pb2.py", line 17, in <module>
        from tensorboard.compat.proto import summary_pb2 as tensorboard_dot_compat_dot_proto_dot_summary__pb2
      File "/its/home/user/.local/lib/python3.9/site-packages/tensorboard/compat/proto/summary_pb2.py", line 17, in <module>
        from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2
      File "/its/home/user/.local/lib/python3.9/site-packages/tensorboard/compat/proto/tensor_pb2.py", line 16, in <module>
        from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2
      File "/its/home/user/.local/lib/python3.9/site-packages/tensorboard/compat/proto/resource_handle_pb2.py", line 16, in <module>
        from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2
      File "/its/home/user/.local/lib/python3.9/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py", line 36, in <module>
        _descriptor.FieldDescriptor(
      File "/its/home/user/.local/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 560, in __new__
        _message.Message._CheckCalledFromGeneratedFile()
    TypeError: Descriptors cannot not be created directly.
    If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
    If you cannot immediately regenerate your protos, some other possible workarounds are:
     1. Downgrade the protobuf package to 3.20.x or lower.
     2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
    
    More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
    
    
    opened by jreus 5
  • question about loudness based loss

    question about loudness based loss

    https://github.com/acids-ircam/RAVE/blob/54c6106eca9760041e7d01e10ba1e9f51f04fc5a/rave/core.py#L130

    I understand that the loudness based distance is just an approximation but I wonder why its added to the log spectrogram instead of mulitplying. Furthermore, what about using a time-domain prefilter as implemented by @csteinmetz1:

    https://github.com/csteinmetz1/auraloss/blob/e732234398ada867138be634dbf66f40360461a2/auraloss/perceptual.py#L124-L129

    and then implement it like:

    mse(self.log_stft(time_domain_prefilter(x)), self.log_stft(time_domain_prefilter(y)))
    

    can you comment on some pro and cons of either way?

    opened by faroit 0
  • lmdb.InvalidParameterError: Invalid argument

    lmdb.InvalidParameterError: Invalid argument

    When trying to run train_rave.py:

    train_rave.py -c small --name test1 --wav wave_data --preprocessed out_dir --sr 16000 --data -size 16 --n-signal 65536 --no-latency false --cropped-latent-size 128 --max-kl 0.1

    I am getting the following error:

    Traceback (most recent call last): File "", line 1, in File "~/python3.10/site-packages/udls/simple_dataset.py", line 66, in init self.env = SimpleLMDBDataset(out_database_location, map_size) File "~/python3.10/site-packages/udls/base_dataset.py", line 12, in init self.env = lmdb.open(out_database_location, lmdb.InvalidParameterError: out_dir: Invalid argument

    It seems to be related to --preprocessed argument. Is this just a directory path, or should I pass something else?

    opened by schramm 1
  • vast.ai error

    vast.ai error

    i am trying to run rave on vast.ai. however if i run it via a notebook or via the CLI i get the following error:

    Traceback (most recent call last): File "/workspace/RAVE/train_rave.py", line 156, in nepoch = args.VAL_EVERY // len(train) ZeroDivisionError: integer division or modulo by zero

    i followed the instructions + ran these two commands as it couldnt find the sndfile otherwise sudo apt-get update sudo apt-get install libsndfile1

    how can we solve this? it would be great if we could run RAVE on vast.ai, it would be cheaper and if desired also faster. thanks.

    opened by x319x 0
  • Receiving Value Error when training N_Components must be between 0 and 32

    Receiving Value Error when training N_Components must be between 0 and 32

    Value Error n_components

    File "/content/miniconda/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 457, in _fit
        return self._fit_full(X, n_components)
      File "/content/miniconda/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 475, in _fit_full
        raise ValueError(
    ValueError: n_components=128 must be between 0 and min(n_samples, n_features)=32 with svd_solver='full'
    
    opened by nightshining 0
  • How many epochs of training should I expect?

    How many epochs of training should I expect?

    I've been running training on a set of audio files and am wondering how I should assess how training is going.

    After about 24 hours, I'm at about 13,000 epics. I'm not sure how to interpret the tensor board visualizations; any pointers would be very much appreciated.

    /content/drive/MyDrive/RAVE_COLLAB
    Recursive search in /content/drive/MyDrive/RAVE_COLLAB/resampled/parbass/
    audio_00158_00000.wav: 100% 159/159 [00:04<00:00, 33.67it/s] 
    /content/miniconda/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
      warnings.warn(_create_warning_msg(
    GPU available: True, used: True
    TPU available: False, using: 0 TPU cores
    IPU available: False, using: 0 IPUs
    HPU available: False, using: 0 HPUs
    Restoring states from the checkpoint path at /content/drive/MyDrive/RAVE_COLLAB/runs/parbass/rave/version_2/checkpoints/last-v1.ckpt
    /content/miniconda/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:342: UserWarning: The dirpath has changed from 'runs/parbass/rave/version_2/checkpoints' to 'runs/parbass/rave/version_3/checkpoints', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded.
      warnings.warn(
    LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
    
      | Name          | Type                | Params
    ------------------------------------------------------
    0 | pqmf          | CachedPQMF          | 4.2 K 
    1 | loudness      | Loudness            | 0     
    2 | encoder       | Encoder             | 4.8 M 
    3 | decoder       | Generator           | 12.8 M
    4 | discriminator | StackDiscriminators | 16.9 M
    ------------------------------------------------------
    34.5 M    Trainable params
    0         Non-trainable params
    34.5 M    Total params
    138.092   Total estimated model params size (MB)
    Restored all states from the checkpoint file at /content/drive/MyDrive/RAVE_COLLAB/runs/parbass/rave/version_2/checkpoints/last-v1.ckpt
    /content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1933: PossibleUserWarning: The number of training batches (19) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
      rank_zero_warn(
    Epoch 11571:   0% 0/20 [00:00<00:00, -106397.56it/s]/content/miniconda/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
      warnings.warn(_create_warning_msg(
    Epoch 12623:  95% 19/20 [00:04<?, ?it/s, v_num=3]
    Validation: 0it [00:00, ?it/s]
    Validation:   0% 0/1 [00:00<?, ?it/s]
    Validation DataLoader 0:   0% 0/1 [00:00<?, ?it/s]
    Epoch 12623: 100% 20/20 [00:04<00:00,  4.59s/it, v_num=3]
    Epoch 12624:   0% 0/19 [00:00<00:00, -111926.65it/s, v_num=3]/content/miniconda/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
      warnings.warn(_create_warning_msg(
    Epoch 12706:  63% 12/19 [00:03<-1:59:57, -2.16it/s, v_num=3]
    
    

    image

    image

    image

    opened by batchku 0
  • AttributeError: 'NoneType' object has no attribute 'astype'

    AttributeError: 'NoneType' object has no attribute 'astype'

    Ok so this might not be a big issue, but I can't seem to be able to train RAVE on my computer because of this simple line:

        preprocess = lambda name: simple_audio_preprocess(
            args.SR,
            2 * args.N_SIGNAL,
        )(name).astype(np.float16)
    

    All I get is :

    Traceback (most recent call last):
      File "..\02_Models\RAVE\train_rave.py", line 99, in <module>
        dataset = SimpleDataset(
      File "C:\Users\User\AppData\Roaming\Python\Python39\site-packages\udls\simple_dataset.py", line 80, in __init__
        self._preprocess()
      File "C:\Users\User\AppData\Roaming\Python\Python39\site-packages\udls\simple_dataset.py", line 120, in _preprocess
        output = self.preprocess_function(wav)
      File "..\02_Models\RAVE\train_rave.py", line 94, in <lambda>
        preprocess = lambda name: simple_audio_preprocess(
    AttributeError: 'NoneType' object has no attribute 'astype'
    

    I feel a little bit lost here... I also tried to remove the cast to float 16 and ended up with a No data found ! error. My dataset is comprised of 1 802 .wav stereo files at 44100Hz (15 Gb).
    It's probably a silly problem, but I would really like to test this model!

    opened by Monratus 1
Owner
Antoine Caillon
Antoine Caillon
Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Manga Filling with ScreenVAE SIGGRAPH ASIA 2020 | Project Website | BibTex This repository is for ScreenVAE introduced in the following paper "Manga F

null 30 Dec 24, 2022
Clockwork Variational Autoencoder

Clockwork Variational Autoencoders (CW-VAE) Vaibhav Saxena, Jimmy Ba, Danijar Hafner If you find this code useful, please reference in your paper: @ar

Vaibhav Saxena 35 Nov 6, 2022
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

Jaehyeon Kim 1.7k Jan 8, 2023
Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Pytorch Recurrent Variational Autoencoder Model: This is the implementation of Samuel Bowman's Generating Sentences from a Continuous Space with Kim's

Daniil Gavrilov 347 Nov 14, 2022
Variational autoencoder for anime face reconstruction

VAE animeface Variational autoencoder for anime face reconstruction Introduction This repository is an exploratory example to train a variational auto

Minzhe Zhang 2 Dec 11, 2021
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Subin An 8 Nov 21, 2022
Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Installation After cloning the repo open

null 37 Dec 3, 2022
A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

torchsynth The fastest synth in the universe. Introduction torchsynth is based upon traditional modular synthesis written in pytorch. It is GPU-option

torchsynth 229 Jan 2, 2023
Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

null 449 Dec 27, 2022
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

Rishikesh S 15 Aug 20, 2022
Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

Xavier 33 Oct 12, 2022
This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020

README This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et a

Raghav 42 Dec 15, 2022
The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

WSRGlow The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio sa

Kexun Zhang 96 Jan 3, 2023
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

PyTorch Realtime Multi-Person Pose Estimation This is a pytorch version of Realtime_Multi-Person_Pose_Estimation, origin code is here Realtime_Multi-P

Dave Fang 157 Nov 12, 2022
Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Tacotron 2 (without wavenet) PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This implementati

NVIDIA Corporation 4.1k Jan 3, 2023
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

NTT Communication Science Laboratories 160 Jan 4, 2023
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

HNECV This repository provides a reference implementation of HNECV as described in the paper: HNECV: Heterogeneous Network Embedding via Cloud model a

null 4 Jun 28, 2022
Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Official repository of the paper (UAI 2021) "A Variational Approximation for Analyzing the Dynamics of Panel Data", Mixed Effect Neural ODE. Panel dat

Jurijs Nazarovs 7 Nov 26, 2022