Clockwork Variational Autoencoder

Vaibhav Saxena

Last update: Nov 6, 2022

Related tags

Deep Learning cwvae

Overview

Clockwork Variational Autoencoders (CW-VAE)

Vaibhav Saxena, Jimmy Ba, Danijar Hafner

If you find this code useful, please reference in your paper:

@article{saxena2021clockworkvae,
  title={Clockwork Variational Autoencoders}, 
  author={Saxena, Vaibhav and Ba, Jimmy and Hafner, Danijar},
  journal={arXiv preprint arXiv:2102.09532},
  year={2021},
}

Method

Clockwork VAEs are deep generative model that learn long-term dependencies in video by leveraging hierarchies of representations that progress at different clock speeds. In contrast to prior video prediction methods that typically focus on predicting sharp but short sequences in the future, Clockwork VAEs can accurately predict high-level content, such as object positions and identities, for 1000 frames.

Clockwork VAEs build upon the Recurrent State Space Model (RSSM), so each state contains a deterministic component for long-term memory and a stochastic component for sampling diverse plausible futures. Clockwork VAEs are trained end-to-end to optimize the evidence lower bound (ELBO) that consists of a reconstruction term for each image and a KL regularizer for each stochastic variable in the model.

More information:

Instructions

This repository contains the code for training the Clockwork VAE model on the datasets minerl, mazes, and mmnist.

The datasets will automatically be downloaded into the --datadir directory.

python3 train.py --logdir /path/to/logdir --datadir /path/to/datasets --config configs/<dataset>.yml

The evaluation script writes open-loop video predictions in both PNG and NPZ format and plots of PSNR and SSIM to the data directory.

python3 eval.py --logdir /path/to/logdir

You might also like...

MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

498 Dec 30, 2022

Molecular AutoEncoder in PyTorch

MolEncoder Molecular AutoEncoder in PyTorch Install $ git clone https://github.com/cxhernandez/molencoder.git && cd molencoder $ python setup.py insta

80 Dec 5, 2022

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

Video Autoencoder: self-supervised disentanglement of 3D structure and motion This repository contains the code (in PyTorch) for the model introduced

157 Dec 22, 2022

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Deep Learning for image classification pip install -r http://webia.lip6.fr/~baskiotisn/requirements-amal.txt Train an autoencoder python3 train_auto

0 Mar 30, 2022

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

2 Dec 17, 2021

An SE(3)-invariant autoencoder for generating the periodic structure of materials

Crystal Diffusion Variational AutoEncoder This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic st

94 Dec 10, 2022

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

sequitur sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three differ

305 Dec 21, 2022

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

ADE20k Semantic segmentation with MAE Getting started Install the mmsegmentation

97 Dec 17, 2022

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

PyVarInf PyVarInf provides facilities to easily train your PyTorch neural network models using variational inference. Bayesian Deep Learning with Vari

342 Dec 2, 2022

Comments

Question: Does the model condition on future time steps?

Hi, I read your paper and I found it really interesting. I've been going through the code to understand it, port it and try it out.

I came across a section that I don't understand though:

From reading the paper I assumed that the model should only condition on information from the past to predict the future. Is this assumption correct first of all?

if so, In the convolutional encoder, you're predicting encodings for each RSSM level in the clockwork vae stack, then you're summing them over time here: https://github.com/vaibhavsaxena11/cwvae/blob/62dd5050d3c20c1c40879539906c54492a756b59/cnns.py#L140

Does this not mean that some prediction at time step X then has access to information from time step X+N (where N is some positive integer < "temporal abstraction factor" ** level), or am I misunderstanding the data flow?

opened by llucid-97 2
potential bug in the encoder

for level in range(1, self._levels): for i_dl in range(self._dense_layers-1): hidden = self.get('h{}_dense'.format(5+(level-1)*self._dense_layers+i_dl), tfkl.Dense, self._embed_size, activation=tf.nn.relu)(hidden) if self._dense_layers > 0: hidden = self.get('h{}_dense'.format(4+level*self._dense_layers), tfkl.Dense, feat_size, activation=None)(hidden) layer = hidden

line 39 onwards in the cnn.py Encoder(), the depth of these layers increases with the level as the hidden variables is overwritten. At large n_levels and n_enc_dense_layers this will result in a very deep network mapping from the observation embedding to the latent space. Not sure it's intentional, doesn't seem to have a purpose, ie is there a reason the higher latent spaces need a deeper function to map from the embedding?

opened by xmax1 1
File encoding error

While running train.py, I get the following error: Failed to encode example: {'video': 'filepath'} In

ffmpeg and ffmpeg-python have already been installed in the virtual environment.

opened by sb93 1

Owner

Vaibhav Saxena

GitHub http://danijar.com/cwvae

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

10 Oct 11, 2022

Clockwork Variational Autoencoder

Related tags

Overview

Clockwork Variational Autoencoders (CW-VAE)

Method

Instructions

You might also like...

MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

Molecular AutoEncoder in PyTorch

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

An SE(3)-invariant autoencoder for generating the periodic structure of materials

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

Comments

Question: Does the model condition on future time steps?

potential bug in the encoder

File encoding error

Owner

Vaibhav Saxena

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Variational autoencoder for anime face reconstruction

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)