Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

ACIDS

Last update: Jan 1, 2023

Related tags

Deep Learning audio ai deep-learning neural-network generative-model

Overview

RAVE: Realtime Audio Variational autoEncoder

Official implementation of RAVE: A variational autoencoder for fast and high-quality neural audio synthesis (article link) by Antoine Caillon and Philippe Esling.

If you use RAVE as a part of a music performance or installation, be sure to cite either this repository or the article !

Installation

RAVE needs python 3.9. Install the dependencies using

pip install -r requirements.txt

Detailed instructions to setup a training station for this project are available here.

Preprocessing

RAVE comes with two command line utilities, resample and duration. resample allows to pre-process (silence removal, loudness normalization) and augment (compression) an entire directory of audio files (.mp3, .aiff, .opus, .wav, .aac). duration prints out the total duration of a .wav folder.

Training

Both RAVE and the prior model are available in this repo. For most users we recommand to use the cli_helper.py script, since it will generate a set of instructions allowing the training and export of both RAVE and the prior model on a specific dataset.

python cli_helper.py

However, if you want to customize even more your training, you can use the provided train_{rave, prior}.py and export_{rave, prior}.py scripts manually.

Reconstructing audio

Once trained, you can reconstruct an entire folder containing wav files using

python reconstruct.py --ckpt /path/to/checkpoint --wav-folder /path/to/wav/folder

You can also export RAVE to a torchscript file using export_rave.py and use the encode and decode methods on tensors.

Realtime usage

UPDATE

If you want to use the realtime mode, you should update your dependencies !

pip install -r requirements.txt

RAVE and the prior model can be used in realtime on live audio streams, allowing creative interactions with both models.

nn~

RAVE is compatible with the nn~ max/msp and PureData external.

An audio example of the prior sampling patch is available in the docs/ folder.

RAVE vst

You can also use RAVE as a VST audio plugin using the RAVE vst !

Discussion

If you have questions, want to share your experience with RAVE or share musical pieces done with the model, you can use the Discussion tab !

Comments

Error when training prior

Hi,

I'm experiencing troubles with training the prior, similar error as described in closed issue #33 . I'm reopening the issue, as I dont think it was completely solved.

Basically get the same Runtime error, as described here. Training of RAVE was done with custom parameters, but can't seem to access them as there was no instructions_*.txtgenerated when working with the cli_helper.py

RuntimeError: cannot reshape tensor of 0 elements into shape [8, 0, 128, -1] because the unspecified dimension size -1 can be any value and is ambiguous

Thanks!

opened by moiseshorta 23
Error trying to launch train_rave.py

Hi, I was trying to launch train_rave.py with a dataset for testing. I am using 310 .wav files with the cli_helper.py, which returned me an error like this:

$ python train_rave.py --name training1 --wav ./dataset/1 --preprocessed /tmp/rave/training1/rave ... 5_38_26.wav: 99%|███████████████████████████▊| 308/310 [00:46<00:00, 6.69it/s]/home/pablo/.local/lib/python3.9/site-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead. warnings.warn("PySoundFile failed. Trying audioread instead.")

2_38_13.wav: 100%|███████████████████████████▉| 309/310 [00:46<00:00, 6.60it/s]/home/pablo/.local/lib/python3.9/site-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead. warnings.warn("PySoundFile failed. Trying audioread instead.")

2_38_13.wav: 100%|████████████████████████████| 310/310 [00:46<00:00, 6.61it/s] Traceback (most recent call last): File "/home/pablo/RAVE/train_rave.py", line 77, in dataset = SimpleDataset( File "/home/pablo/.local/lib/python3.9/site-packages/udls/simple_dataset.py", line 83, in init raise Exception("No data found !") Exception: No data found !

Have you seen this error before? I am trying to use it with a GPU, but launching only with CPU throws the same error. Maybe there is a problem with the dataset?

Thanks!

opened by pgm-n117 15
Kernel size error when rave.decode(z)

i get the following error when i try generating from the prior

Traceback (most recent call last): File "/home/syrinx/RAVE/ravezeke1-generate.py", line 43, in <module> y = rave.decode(z) File "/home/syrinx/RAVE/rave/model.py", line 582, in decode y = self.decoder(z, add_noise=True) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/syrinx/RAVE/rave/model.py", line 235, in forward x = self.net(x) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl result = forward_call(*input, **kwargs) File "/home/syrinx/miniconda2/envs/rave/lib/python3.9/site-packages/cached_conv/convs.py", line 74, in forward return nn.functional.conv1d( RuntimeError: Calculated padded input size per channel: (6). Kernel size: (7). Kernel size can't be greater than actual input size

this is the code

`################ PRIOR GENERATION ################

STEP 1: CREATE DUMMY INPUT TENSOR

generation_length = 2**18 # approximately 6s at 48kHz x = torch.randn(1, 1, generation_length) # dummy input z = rave.encode(x) # dummy latent representation z = torch.zeros_like(z)

STEP 2: AUTOREGRESSIVE GENERATION

z = prior.quantized_normal.encode(prior.diagonal_shift(z)) z = prior.generate(z) z = prior.diagonal_shift.inverse(prior.quantized_normal.decode(z))

STEP 3: SYNTHESIS AND EXPORT

y = rave.decode(z) sf.write("output_audio.wav", y.reshape(-1).numpy(), sr) `

when i change generation_length to a smaller size, i get the error

RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

my audio is in 44.1kHz

opened by robinmeier 15
Fixed loss during prior training

Hello,

It seems that i'm missing something in the training procedure. I trained a RAVE model for about 650K steps (after which it seemed to plateau).

Then exported it and started training the prior. Weirdly I am already at 500K steps and the loss is not decreasing, it seems to be stuck between 3,19 and 3,12.

When i try to get audio sample in Tensorboard, the prior model outputs noise whereas the rave one is doing ok.

I am not sure what i'm doing wrong, I read some people talking about phase 2 kicking in, but im also confused wether this is the second step (the GAN training) or something happening within the first step.

If someone could shed some light on these questions that would be super helpful. Thanks!

opened by chebmarcel 10
Training tip for complex datasets

Hi,

Just wanted to share some insights into training RAVE with complex datasets (full songs, heterogeneous sounds).

One solution I've come across having a good convergence on models is extending the latent size parameter to a full 128. Also, which is what I think had a greater impact, is to extend the Phase 1 training or warmup to about 5 million steps before switching to Phase 2.

Obviously, more data will have better results, but so far I've found that the more 'detailed' features in my dataset, namely melodies, textures of individual mid-to-high freq. range instruments seem to start converging better beyond the suggested warmup phase of 1 million steps, as the loss keeps consistently going down.

Hope this helps anyone having trouble with this.

Good luck!

opened by moiseshorta 7
Parallel Training

Hey again,

thanks again for this awesome library. I have been playing around with it for about a month now and I feel like I am finally starting to get the hang of it haha.

I noticed you have a parallel_traning.sh script. This seems to be for training multiple models in parallel, is that correct?

I am training these model on my own hardware (which can take some time) and would love to make use of both of the GPUs I have. I tried to modify the script a few weeks ago but ran into errors with python_lightning, specifically with a data class. Is this something you plan on supporting?

If not, I would love to take a crack at it and make a PR.

What are your thoughts?

opened by iamzoltan 7
train_rave.py does not recognise GPU
Hey there,

it seems that the train_rave.py script (more specifically pytorch lightning) can't find my system's GPU.

A few machine specs:

Win 10 Pro

RTX 3060

Steps:

Downloaded this repository

Created a new env using miniconda

Installed pip, then installed all RAVE dependencies

Ran cli_helper.py

Tried to call train_rave.py as per cli_helper's output - I'm getting the following error: pytorch_lightning.utilities.exceptions.MisconfigurationException: GPUAccelerator can not run on your system since the accelerator is not available. The following accelerator(s) is available and can be passed into `accelerator` argument of `Trainer`: ['cpu'].

Any ideas?

Thanks in advance.
opened by iorhythm 6
Error when trying to train the prior

Hey again,

I just finished training and exporting a new model, but I cant seem to get it to train the prior. I am getting the following error when exporting the model:

/home/user/code/RAVE/env3.9/lib/python3.9/site-packages/pytorch_lightning/core/saving.py:217: UserWarning: Found keys that are not in the model state dict but in the checkpoint: ['decoder.net.2.net.0.aligned.paddings.0.pad', 'decoder.net.2.net.0.aligned.paddings.1.pad', 'decoder.net.2.net.1.aligned.paddings.0.pad', 'decoder.net.2.net.1.aligned.paddings.1.pad', 'decoder.net.2.net.2.aligned.paddings.0.pad', 'decoder.net.2.net.2.aligned.paddings.1.pad', 'decoder.net.4.net.0.aligned.paddings.0.pad', 'decoder.net.4.net.0.aligned.paddings.1.pad', 'decoder.net.4.net.1.aligned.paddings.0.pad', 'decoder.net.4.net.1.aligned.paddings.1.pad', 'decoder.net.4.net.2.aligned.paddings.0.pad', 'decoder.net.4.net.2.aligned.paddings.1.pad', 'decoder.net.6.net.0.aligned.paddings.0.pad', 'decoder.net.6.net.0.aligned.paddings.1.pad', 'decoder.net.6.net.1.aligned.paddings.0.pad', 'decoder.net.6.net.1.aligned.paddings.1.pad', 'decoder.net.6.net.2.aligned.paddings.0.pad', 'decoder.net.6.net.2.aligned.paddings.1.pad', 'decoder.net.8.net.0.aligned.paddings.0.pad', 'decoder.net.8.net.0.aligned.paddings.1.pad', 'decoder.net.8.net.1.aligned.paddings.0.pad', 'decoder.net.8.net.1.aligned.paddings.1.pad', 'decoder.net.8.net.2.aligned.paddings.0.pad', 'decoder.net.8.net.2.aligned.paddings.1.pad', 'decoder.synth.paddings.0.pad', 'decoder.synth.paddings.1.pad', 'decoder.synth.paddings.2.pad'] rank_zero_warn(

any ides?

opened by iamzoltan 6

Model Size Mismatch

not sure what going on here, but after I pulled the latest master, I am getting this error:

RuntimeError: Error(s) in loading state_dict for RAVE:
	size mismatch for decoder.net.2.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
	size mismatch for decoder.net.2.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
	size mismatch for decoder.net.2.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
	size mismatch for decoder.net.2.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
	size mismatch for decoder.net.2.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
	size mismatch for decoder.net.2.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 512, 0]) from checkpoint, the shape in current model is torch.Size([64, 512, 0]).
	size mismatch for decoder.net.4.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
	size mismatch for decoder.net.4.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
	size mismatch for decoder.net.4.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
	size mismatch for decoder.net.4.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
	size mismatch for decoder.net.4.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
	size mismatch for decoder.net.4.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 256, 0]) from checkpoint, the shape in current model is torch.Size([64, 256, 0]).
	size mismatch for decoder.net.6.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
	size mismatch for decoder.net.6.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
	size mismatch for decoder.net.6.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
	size mismatch for decoder.net.6.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
	size mismatch for decoder.net.6.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
	size mismatch for decoder.net.6.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 128, 0]) from checkpoint, the shape in current model is torch.Size([64, 128, 0]).
	size mismatch for decoder.net.8.net.0.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
	size mismatch for decoder.net.8.net.0.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
	size mismatch for decoder.net.8.net.1.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
	size mismatch for decoder.net.8.net.1.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
	size mismatch for decoder.net.8.net.2.aligned.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
	size mismatch for decoder.net.8.net.2.aligned.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
	size mismatch for decoder.synth.paddings.0.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
	size mismatch for decoder.synth.paddings.1.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).
	size mismatch for decoder.synth.paddings.2.pad: copying a param with shape torch.Size([8, 64, 0]) from checkpoint, the shape in current model is torch.Size([64, 64, 0]).

any insights would be welcomed!

opened by iamzoltan 6

Can't load pretrained state

Hey, I used rave.cpkt in the example "loading pertained models" from the readme and got the following error:

Sincerely sorry for the painfully nooby issue

code

`import torch

torch.set_grad_enabled(False) from rave import RAVE from prior import Prior

import librosa as li import soundfile as sf

################ LOADING PRETRAINED MODELS ################ rave = RAVE.load_from_checkpoint("./rave_pretrained/darbouka/rave.ckpt").eval()`

error

Traceback (most recent call last): File "/Users/krayyy/PycharmProjects/pythonProject11/rave39test.py", line 11, in <module> rave = RAVE.load_from_checkpoint("./rave_pretrained/darbouka/rave.ckpt").eval() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pytorch_lightning/core/saving.py", line 153, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pytorch_lightning/core/saving.py", line 201, in _load_model_state keys = model.load_state_dict(checkpoint["state_dict"], strict=strict) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/torch/nn/modules/module.py", line 1482, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for RAVE: Unexpected key(s) in state_dict: "decoder.net.2.net.0.aligned.paddings.0.pad", "decoder.net.2.net.0.aligned.paddings.1.pad", "decoder.net.2.net.1.aligned.paddings.0.pad", "decoder.net.2.net.1.aligned.paddings.1.pad", "decoder.net.2.net.2.aligned.paddings.0.pad", "decoder.net.2.net.2.aligned.paddings.1.pad", "decoder.net.4.net.0.aligned.paddings.0.pad", "decoder.net.4.net.0.aligned.paddings.1.pad", "decoder.net.4.net.1.aligned.paddings.0.pad", "decoder.net.4.net.1.aligned.paddings.1.pad", "decoder.net.4.net.2.aligned.paddings.0.pad", "decoder.net.4.net.2.aligned.paddings.1.pad", "decoder.net.6.net.0.aligned.paddings.0.pad", "decoder.net.6.net.0.aligned.paddings.1.pad", "decoder.net.6.net.1.aligned.paddings.0.pad", "decoder.net.6.net.1.aligned.paddings.1.pad", "decoder.net.6.net.2.aligned.paddings.0.pad", "decoder.net.6.net.2.aligned.paddings.1.pad", "decoder.net.8.net.0.aligned.paddings.0.pad", "decoder.net.8.net.0.aligned.paddings.1.pad", "decoder.net.8.net.1.aligned.paddings.0.pad", "decoder.net.8.net.1.aligned.paddings.1.pad", "decoder.net.8.net.2.aligned.paddings.0.pad", "decoder.net.8.net.2.aligned.paddings.1.pad", "decoder.synth.paddings.0.pad", "decoder.synth.paddings.1.pad", "decoder.synth.paddings.2.pad".

opened by xxrraa 6

Profocol buffers error TypeError: Descriptors cannot not be created directly.

After a fresh install of miniconda and updating conda itself (Linux, Ubuntu) I'm getting the following error when trying to train a RAVE model.

Prepending PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python to the training command gets it running. But is this expected behavior? Seems to be some kind of compatibility issue with the latest version of python...?

$ CUDA_VISIBLE_DEVICES=1 python train_rave.py --name blonk_nolatency_1 --wav ~/datasets/unconditional/Blonk/Vocalor/ --preprocessed ../jobs/rave/blonk_nolatency_1/tmp --sr 48000 --data-size 16 --no-latency true                                     

Traceback (most recent call last):
  File "/its/home/user/RAVE/train_rave.py", line 4, in <module>
    from rave.model import RAVE
  File "/its/home/user/RAVE/rave/__init__.py", line 1, in <module>
    from .model import RAVE
  File "/its/home/user/RAVE/rave/model.py", line 5, in <module>
    import pytorch_lightning as pl
  File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/__init__.py", line 30, in <module>
    from pytorch_lightning.callbacks import Callback  # noqa: E402
  File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/callbacks/__init__.py", line 26, in <module>
    from pytorch_lightning.callbacks.pruning import ModelPruning
  File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/callbacks/pruning.py", line 31, in <module>
    from pytorch_lightning.core.lightning import LightningModule
  File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/core/__init__.py", line 16, in <module>
    from pytorch_lightning.core.lightning import LightningModule
  File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 40, in <module>
    from pytorch_lightning.loggers import LightningLoggerBase
  File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/loggers/__init__.py", line 18, in <module>
    from pytorch_lightning.loggers.tensorboard import TensorBoardLogger
  File "/its/home/user/.local/lib/python3.9/site-packages/pytorch_lightning/loggers/tensorboard.py", line 26, in <module>
    from torch.utils.tensorboard import SummaryWriter
  File "/its/home/user/.local/lib/python3.9/site-packages/torch/utils/tensorboard/__init__.py", line 10, in <module>
    from .writer import FileWriter, SummaryWriter  # noqa: F401
  File "/its/home/user/.local/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 9, in <module>
    from tensorboard.compat.proto.event_pb2 import SessionLog
  File "/its/home/user/.local/lib/python3.9/site-packages/tensorboard/compat/proto/event_pb2.py", line 17, in <module>
    from tensorboard.compat.proto import summary_pb2 as tensorboard_dot_compat_dot_proto_dot_summary__pb2
  File "/its/home/user/.local/lib/python3.9/site-packages/tensorboard/compat/proto/summary_pb2.py", line 17, in <module>
    from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2
  File "/its/home/user/.local/lib/python3.9/site-packages/tensorboard/compat/proto/tensor_pb2.py", line 16, in <module>
    from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2
  File "/its/home/user/.local/lib/python3.9/site-packages/tensorboard/compat/proto/resource_handle_pb2.py", line 16, in <module>
    from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2
  File "/its/home/user/.local/lib/python3.9/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "/its/home/user/.local/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 560, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

opened by jreus 5

question about loudness based loss
https://github.com/acids-ircam/RAVE/blob/54c6106eca9760041e7d01e10ba1e9f51f04fc5a/rave/core.py#L130

I understand that the loudness based distance is just an approximation but I wonder why its added to the log spectrogram instead of mulitplying. Furthermore, what about using a time-domain prefilter as implemented by @csteinmetz1:

https://github.com/csteinmetz1/auraloss/blob/e732234398ada867138be634dbf66f40360461a2/auraloss/perceptual.py#L124-L129

and then implement it like:

mse(self.log_stft(time_domain_prefilter(x)), self.log_stft(time_domain_prefilter(y)))

can you comment on some pro and cons of either way?
opened by faroit 0
lmdb.InvalidParameterError: Invalid argument

When trying to run train_rave.py:

train_rave.py -c small --name test1 --wav wave_data --preprocessed out_dir --sr 16000 --data -size 16 --n-signal 65536 --no-latency false --cropped-latent-size 128 --max-kl 0.1

I am getting the following error:

Traceback (most recent call last): File "", line 1, in File "~/python3.10/site-packages/udls/simple_dataset.py", line 66, in init self.env = SimpleLMDBDataset(out_database_location, map_size) File "~/python3.10/site-packages/udls/base_dataset.py", line 12, in init self.env = lmdb.open(out_database_location, lmdb.InvalidParameterError: out_dir: Invalid argument

It seems to be related to --preprocessed argument. Is this just a directory path, or should I pass something else?

opened by schramm 1
vast.ai error

i am trying to run rave on vast.ai. however if i run it via a notebook or via the CLI i get the following error:

Traceback (most recent call last): File "/workspace/RAVE/train_rave.py", line 156, in nepoch = args.VAL_EVERY // len(train) ZeroDivisionError: integer division or modulo by zero

i followed the instructions + ran these two commands as it couldnt find the sndfile otherwise sudo apt-get update sudo apt-get install libsndfile1

how can we solve this? it would be great if we could run RAVE on vast.ai, it would be cheaper and if desired also faster. thanks.

opened by x319x 0

Receiving Value Error when training N_Components must be between 0 and 32

Value Error n_components

File "/content/miniconda/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 457, in _fit
    return self._fit_full(X, n_components)
  File "/content/miniconda/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 475, in _fit_full
    raise ValueError(
ValueError: n_components=128 must be between 0 and min(n_samples, n_features)=32 with svd_solver='full'

opened by nightshining 0

How many epochs of training should I expect?

I've been running training on a set of audio files and am wondering how I should assess how training is going.

After about 24 hours, I'm at about 13,000 epics. I'm not sure how to interpret the tensor board visualizations; any pointers would be very much appreciated.

/content/drive/MyDrive/RAVE_COLLAB
Recursive search in /content/drive/MyDrive/RAVE_COLLAB/resampled/parbass/
audio_00158_00000.wav: 100% 159/159 [00:04<00:00, 33.67it/s] 
/content/miniconda/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Restoring states from the checkpoint path at /content/drive/MyDrive/RAVE_COLLAB/runs/parbass/rave/version_2/checkpoints/last-v1.ckpt
/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:342: UserWarning: The dirpath has changed from 'runs/parbass/rave/version_2/checkpoints' to 'runs/parbass/rave/version_3/checkpoints', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded.
  warnings.warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type                | Params
------------------------------------------------------
0 | pqmf          | CachedPQMF          | 4.2 K 
1 | loudness      | Loudness            | 0     
2 | encoder       | Encoder             | 4.8 M 
3 | decoder       | Generator           | 12.8 M
4 | discriminator | StackDiscriminators | 16.9 M
------------------------------------------------------
34.5 M    Trainable params
0         Non-trainable params
34.5 M    Total params
138.092   Total estimated model params size (MB)
Restored all states from the checkpoint file at /content/drive/MyDrive/RAVE_COLLAB/runs/parbass/rave/version_2/checkpoints/last-v1.ckpt
/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1933: PossibleUserWarning: The number of training batches (19) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Epoch 11571:   0% 0/20 [00:00<00:00, -106397.56it/s]/content/miniconda/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Epoch 12623:  95% 19/20 [00:04<?, ?it/s, v_num=3]
Validation: 0it [00:00, ?it/s]
Validation:   0% 0/1 [00:00<?, ?it/s]
Validation DataLoader 0:   0% 0/1 [00:00<?, ?it/s]
Epoch 12623: 100% 20/20 [00:04<00:00,  4.59s/it, v_num=3]
Epoch 12624:   0% 0/19 [00:00<00:00, -111926.65it/s, v_num=3]/content/miniconda/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Epoch 12706:  63% 12/19 [00:03<-1:59:57, -2.16it/s, v_num=3]

opened by batchku 0

AttributeError: 'NoneType' object has no attribute 'astype'

Ok so this might not be a big issue, but I can't seem to be able to train RAVE on my computer because of this simple line:

    preprocess = lambda name: simple_audio_preprocess(
        args.SR,
        2 * args.N_SIGNAL,
    )(name).astype(np.float16)

All I get is :

Traceback (most recent call last):
  File "..\02_Models\RAVE\train_rave.py", line 99, in <module>
    dataset = SimpleDataset(
  File "C:\Users\User\AppData\Roaming\Python\Python39\site-packages\udls\simple_dataset.py", line 80, in __init__
    self._preprocess()
  File "C:\Users\User\AppData\Roaming\Python\Python39\site-packages\udls\simple_dataset.py", line 120, in _preprocess
    output = self.preprocess_function(wav)
  File "..\02_Models\RAVE\train_rave.py", line 94, in <lambda>
    preprocess = lambda name: simple_audio_preprocess(
AttributeError: 'NoneType' object has no attribute 'astype'

I feel a little bit lost here... I also tried to remove the cast to float 16 and ended up with a No data found ! error. My dataset is comprised of 1 802 .wav stereo files at 44100Hz (15 Gb).
It's probably a silly problem, but I would really like to test this model!

opened by Monratus 1

Owner

ACIDS

Artificial Creative Intelligence and Data Science

GitHub

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Manga Filling with ScreenVAE SIGGRAPH ASIA 2020 | Project Website | BibTex This repository is for ScreenVAE introduced in the following paper "Manga F

30 Dec 24, 2022

Clockwork Variational Autoencoder

Clockwork Variational Autoencoders (CW-VAE) Vaibhav Saxena, Jimmy Ba, Danijar Hafner If you find this code useful, please reference in your paper: @ar

35 Nov 6, 2022

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

1.7k Jan 8, 2023

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Pytorch Recurrent Variational Autoencoder Model: This is the implementation of Samuel Bowman's Generating Sentences from a Continuous Space with Kim's

347 Nov 14, 2022

Variational autoencoder for anime face reconstruction

VAE animeface Variational autoencoder for anime face reconstruction Introduction This repository is an exploratory example to train a variational auto

2 Dec 11, 2021

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

8 Nov 21, 2022

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Installation After cloning the repo open

37 Dec 3, 2022

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

torchsynth The fastest synth in the universe. Introduction torchsynth is based upon traditional modular synthesis written in pytorch. It is GPU-option

229 Jan 2, 2023

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

449 Dec 27, 2022

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

15 Aug 20, 2022

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

33 Oct 12, 2022

This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020

README This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et a

42 Dec 15, 2022

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

WSRGlow The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio sa

96 Jan 3, 2023

Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Official repository of the paper (UAI 2021) "A Variational Approximation for Analyzing the Dynamics of Panel Data", Mixed Effect Neural ODE. Panel dat

7 Nov 26, 2022

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Related tags

Overview

RAVE: Realtime Audio Variational autoEncoder

Installation

Preprocessing

Training

Reconstructing audio

Realtime usage

Discussion

Comments

STEP 1: CREATE DUMMY INPUT TENSOR

STEP 2: AUTOREGRESSIVE GENERATION

STEP 3: SYNTHESIS AND EXPORT

Owner

ACIDS

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Clockwork Variational Autoencoder

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Variational autoencoder for anime face reconstruction

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.