MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

Andrej

Last update: Dec 30, 2022

Related tags

Deep Learning pytorch-made

Overview

pytorch-made

This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn an auto-encoder into an autoregressive density model just by appropriately masking the connections in the MLP, ordering the input dimensions in some way and making sure that all outputs only depend on inputs earlier in the list. Like other autoregressive models (char-rnn, pixel cnns, etc), evaluating the likelihood is very cheap (a single forward pass), but sampling is linear in the number of dimensions.

The authors of the paper also published code here, but it's a bit wordy, sprawling and in Theano. Hence my own shot at it with only ~150 lines of code and PyTorch <3.

examples

First we download the binarized mnist dataset. Then we can reproduce the first point on the plot of Figure 2 by training a 1-layer MLP of 500 units with only a single mask, and using a single fixed (but random) ordering as so:

python run.py --data-path binarized_mnist.npz -q 500

which converges at binary cross entropy loss of 94.5, as shown in the paper. We can then simultaneously train a larger model ensemble (with weight sharing in the one MLP) and average over all of the models at test time. For instance, we can use 10 orderings (-n 10) and also average over the 10 at inference time (-s 10):

python run.py --data-path binarized_mnist.npz -q 500 -n 10 -s 10

which gives a much better test loss of 79.3, but at the cost of multiple forward passes. I was not able to reproduce single-forward-pass gains that the paper alludes to when training with multiple masks, might be doing something wrong.

usage

The core class is MADE, found in made.py. It inherits from PyTorch nn.Module so you can "slot it into" larger architectures quite easily. To instantiate MADE on 1D inputs of MNIST digits for example (which have 28*28 pixels), using one hidden layer of 500 neurons, and using a single but random ordering we would do:

model = MADE(28*28, [500], 28*28, num_masks=1, natural_ordering=False)

The reason we plug the size of the output (3rd argument) into MADE is that one might want to use relatively complicated output distributions, for example a gaussian distribution would normally be parameterized by a mean and a standard deviation for each dimension, or you could bin the output range into buckets and output logprobs for a softmax, or mixture parameters, etc. In the simplest example in this code we use binary predictions, where are only parameterized by one number, hence the number of the input dimensions happens to equal the number of outputs.

License

MIT

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

247 Dec 16, 2022

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

8 Nov 21, 2022

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

1 Dec 14, 2021

Molecular AutoEncoder in PyTorch

MolEncoder Molecular AutoEncoder in PyTorch Install $ git clone https://github.com/cxhernandez/molencoder.git && cd molencoder $ python setup.py insta

80 Dec 5, 2022

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Pytorch Recurrent Variational Autoencoder Model: This is the implementation of Samuel Bowman's Generating Sentences from a Continuous Space with Kim's

347 Nov 14, 2022

Details about the wide minima density hypothesis and metrics to compute width of a minima

wide-minima-density-hypothesis Details about the wide minima density hypothesis and metrics to compute width of a minima This repo presents the wide m

9 Dec 27, 2022

SMD-Nets: Stereo Mixture Density Networks

SMD-Nets: Stereo Mixture Density Networks This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021)

115 Dec 26, 2022

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their

8 May 22, 2022

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

64 Jan 7, 2023

Comments

Reproduce example
Hello, I tried to reproduce the example of the first figure (3 inputs, 3 outputs and 2 dense layers of 4 neurons) and the mask is not the same (result: https://imgur.com/5P5C2fn).

output 0 depends on inputs: [] : OK output 1 depends on inputs: [] : OK output 2 depends on inputs: [0, 1] : OK

So two outputs remain without dependence. What could be the problem? Thank you.
opened by carlogarro 0
log-likelihood of order/connectivity-agnostic training

Here you simply average over the logits. However, did I understand correctly that the order/connectivity-agnostic training amounts to a mixture model with $p(x) = \sum_{o} p(x, o)$ ? If so, it appears that one needs to compute log-likelihood for each mask and then perform logsumexp to obtain the mixture log-likelihood.

opened by wangleiphy 0

MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

Related tags

Overview

pytorch-made

examples

usage

License

You might also like...

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Molecular AutoEncoder in PyTorch

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Details about the wide minima density hypothesis and metrics to compute width of a minima

SMD-Nets: Stereo Mixture Density Networks

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

Comments

Reproduce example

log-likelihood of order/connectivity-agnostic training

Owner

Andrej

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

PyTorch implementations of algorithms for density estimation

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

Estimation of human density in a closed space using deep learning.

This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs)

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners