Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Shayne O'Brien

Last update: Dec 16, 2022

Related tags

Deep Learning python machine-learning pytorch discriminator generative-adversarial-network gan infogan autoencoder vae wasserstein wgan lsgan began generative-models dragan fishergan mmgan nsgan ragan fgan

Overview

PyTorch 0.4.1 | Python 3.6.5

Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein gradient penalty, least squares, deep regret analytic, bounded equilibrium, relativistic, f-divergence, Fisher, and information generative adversarial networks (GANs), and standard, variational, and bounded information rate variational autoencoders (VAEs).

Paper links are supplied at the beginning of each file with a short summary of the paper. See src folder for files to run via terminal, or notebooks folder for Jupyter notebook visualizations via your local browser. The main file changes can be see in the train, train_D, and train_G of the Trainer class, although changes are not completely limited to only these two areas (e.g. Wasserstein GAN clamps weight in the train function, BEGAN gives multiple outputs from train_D, fGAN has a slight modification in viz_loss function to indicate method used in title).

All code in this repository operates in a generative, unsupervised manner on binary (black and white) MNIST. The architectures are compatible with a variety of datatypes (1D, 2D, square 3D images). Plotting functions work with binary/RGB images. If a GPU is detected, the models use it. Otherwise, they default to CPU. VAE Trainer classes contain methods to visualize latent space representations (see make_all function).

Usage

To initialize an environment:

python -m venv env  
. env/bin/activate  
pip install -r requirements.txt

For playing around in Jupyer notebooks:

jupyter notebook

To run from Terminal:

cd src
python bir_vae.py

New Models

One of the primary purposes of this repository is to make implementing deep generative model (i.e., GAN/VAE) variants as easy as possible. This is possible because, typically but not always (e.g. BIRVAE), the proposed modifications only apply to the way loss is computed for backpropagation. Thus, the core training class is structured in such a way that most new implementations should only require edits to the train_D and train_G functions of GAN Trainer classes, and the compute_batch function of VAE Trainer classes.

Suppose we have a non-saturating GAN and we wanted to implement a least-squares GAN. To do this, all we have to do is change two lines:

Original (NSGAN)

def train_D(self, images):
  ...
  D_loss = -torch.mean(torch.log(DX_score + 1e-8) + torch.log(1 - DG_score + 1e-8))

  return D_loss

def train_G(self, images):
  ...
  G_loss = -torch.mean(torch.log(DG_score + 1e-8))

  return G_loss

New (LSGAN)

def train_D(self, images):
  ...
  D_loss = (0.50 * torch.mean((DX_score - 1.)**2)) + (0.50 * torch.mean((DG_score - 0.)**2))

  return D_loss

def train_G(self, images):
  ...
  G_loss = 0.50 * torch.mean((DG_score - 1.)**2)

  return G_loss

Model Architecture

The architecture chosen in these implementations for both the generator (G) and discriminator (D) consists of a simple, two-layer feedforward network. While this will give sensible output for MNIST, in practice it is recommended to use deep convolutional architectures (i.e. DCGANs) to get nicer outputs. This can be done by editing the Generator and Discriminator classes for GANs, or the Encoder and Decoder classes for VAEs.

Visualization

All models were trained for 25 epochs with hidden dimension 400, latent dimension 20. Other implementation specifics are as close to the respective original paper (linked) as possible.

Model	Epoch 1	Epoch 25	Progress	Loss
MMGAN
NSGAN
WGAN
WGPGAN
DRAGAN
BEGAN
LSGAN
RaNSGAN
FisherGAN
InfoGAN
f-TVGAN
f-PearsonGAN
f-JSGAN
f-ForwGAN
f-RevGAN
f-HellingerGAN
VAE
BIRVAE

To Do

Models: CVAE, denoising VAE, adversarial autoencoder | Bayesian GAN, Self-attention GAN, Primal-Dual Wasserstein GAN
Architectures: Add DCGAN option
Datasets: Beyond MNIST

You might also like...

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Diverse Structure Inpainting ArXiv | Papar | Supplementary Material | BibTex This repository is for the CVPR 2021 paper, "Generating Diverse Structure

152 Nov 4, 2022

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

41 Dec 9, 2022

3D AffordanceNet is a 3D point cloud benchmark consisting of 23k shapes from 23 semantic object categories, annotated with 56k affordance annotations and covering 18 visual affordance categories.

3D AffordanceNet This repository is the official experiment implementation of 3D AffordanceNet benchmark. 3D AffordanceNet is a 3D point cloud benchma

49 Dec 1, 2022

Comments

vae.py line 320 use this to plot

replace these lines - plt.scatter(m1s, m2s, c=labels) plt.legend([str(i) for i in set(labels)])

with the lines below - scatter = plt.scatter(m1s, m2s, c=labels) plt.legend(handles=scatter.legend_elements()[0], labels=set(labels))

opened by djaym7 0
WGAN-GP mispelling

In the notebook '04-wasserstein-gan-gradient-penalty.ipynb', the name of the model, 'WGANGP' and the trainer 'WGANGPTrainer' are incorrect concerning the corresponding src file (w_gp_gan.py). It should be changed into 'WGPGAN' and 'WGPGANTrainer'.

opened by emmaducos 0

Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Related tags

Overview

Overview

Usage

New Models

Model Architecture

Visualization

To Do

You might also like...

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Annotated notes and summaries of the TensorFlow white paper, along with SVG figures and links to documentation

CCPD: a diverse and well-annotated dataset for license plate detection and recognition

3D AffordanceNet is a 3D point cloud benchmark consisting of 23k shapes from 23 semantic object categories, annotated with 56k affordance annotations and covering 18 visual affordance categories.

Comments

vae.py line 320 use this to plot

WGAN-GP mispelling

Owner

Shayne O'Brien

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

BEGAN in PyTorch

Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

PyTorch package for the discrete VAE used for DALL·E.

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

VideoGPT: Video Generation using VQ-VAE and Transformers