Official implementation of Densely connected normalizing flows

Overview

Densely connected normalizing flows

This repository is the official implementation of NeurIPS 2021 paper Densely connected normalizing flows. Poster available here.

PWC PWC

Setup

  • CUDA 11.1
  • Python 3.8
pip install -r requirements.txt
pip install -e .

Training

cd ./experiments/image

CIFAR-10:

python train.py --epochs 400 --batch_size 64 --optimizer adamax --lr 1e-3  --gamma 0.9975 --warmup 5000  --eval_every 1 --check_every 10 --dataset cifar10 --augmentation eta --block_conf 6 4 1 --layers_conf  5 6 20  --layer_mid_chnls 48 48 48 --growth_rate 10  --name DF_74_10
python train_more.py --model ./log/cifar10_8bit/densenet-flow/expdecay/DF_74_10 --new_lr 2e-5 --new_epochs 420

ImageNet32:

python train.py --epochs 20 --batch_size 64 --optimizer adamax --lr 1e-3  --gamma 0.95 --warmup 5000  --eval_every 1 --check_every 10 --dataset imagenet32 --augmentation eta --block_conf 6 4 1 --layers_conf  5 6 20  --layer_mid_chnls 48 48 48 --growth_rate 10  --name DF_74_10
python train_more.py --model ./log/imagenet32_8bit/densenet-flow/expdecay/DF_74_10 --new_lr 2e-5 --new_epochs 22

ImageNet64:

python train.py --epochs 10 --batch_size 32 --optimizer adamax --lr 1e-3  --gamma 0.95 --warmup 5000  --eval_every 1 --check_every 10 --dataset imagenet64 --augmentation eta --block_conf 6 4 1 --layers_conf  5 6 20  --layer_mid_chnls 48 48 48 --growth_rate 10  --name DF_74_10
python train_more.py --model ./log/imagenet64_8bit/densenet-flow/expdecay/DF_74_10 --new_lr 2e-5 --new_epochs 11

CelebA:

python train.py --epochs 50 --batch_size 32 --optimizer adamax --lr 1e-3  --gamma 0.95 --warmup 5000  --eval_every 1 --check_every 10 --dataset celeba --augmentation horizontal_flip --block_conf 6 4 1 --layers_conf  5 6 20  --layer_mid_chnls 48 48 48 --growth_rate 10  --name DF_74_10
python train_more.py --model ./log/celeba_8bit/densenet-flow/expdecay/DF_74_10 --new_lr 2e-5 --new_epochs 55

Note: Download instructions for ImageNet and CelebA are defined in denseflow/data/datasets/image/{dataset}.py

Evaluation

CIFAR-10:

python eval_loglik.py --model PATH_TO_MODEL --k 1000 --kbs 50

ImageNet32:

python eval_loglik.py --model PATH_TO_MODEL --k 200 --kbs 50

ImageNet64 and CelebA:

python eval_loglik.py --model PATH_TO_MODEL --k 200 --kbs 25

Model weights

Model weights are stored here.

Samples generation

Generated samples are stored in PATH_TO_MODEL/samples

python eval_sample.py --model PATH_TO_MODEL

Note: PATH_TO_MODEL has to contain check directory.

ImageNet 32x32

Alt text

ImageNet 64x64

Alt text

CelebA

Alt text

Acknowledgements

Significant part of this code benefited from SurVAE [1] code implementation, available under MIT license.

References

[1] Didrik Nielsen, Priyank Jaini, Emiel Hoogeboom, Ole Winther, and Max Welling. Survae flows: Surjections to bridge the gap between vaes and flows. InAdvances in Neural Information Processing Systems 33. Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020.

Comments
  • Noise growth rate ablation

    Noise growth rate ablation

    Hi, thanks for this great work! I was recently researching on the effect of augmentation noise.

    My intuition is that larger growth rate will result in larger capacity of the model. However, by altering DF_74_10 to DF_74_50, I observe that the convergence becomes extremely slow than DF_74_10. I was wondering if you could share any experience on the selection of growth rate. :-)

    opened by prclibo 6
  • Questions Regarding Article

    Questions Regarding Article

    Congratulations on very good work! I'm very impressed with what you've been able to accomplish, especially taking the limited compute budget into account!

    Question 0. In Table 1 you report the model parameters and training time for three datasets, but not ImageNet64. Do you have the time/parameter numbers for ImageNet64? Are they the same as ImageNet32?

    Question 1. FID computation. You write the for image generation you sample from N(0, 0.8). Did you compute FID using this N(0, 0.8) or N(0, I)? I ask because I believe FID is slightly biased in favor of GANs and denoising diffusion models, because they usually trade-off quality of individual images for variability. You could find the sweet point in this trade-off for DenseFlow by computing FID for N(0, alpha) where alpha=0.5,0.6,...,1. I'd be curious to see the best number you'd be able to get.

    Question 2. Since NF sample ~num_pixels faster than autoregressive models, I'm curious as to whether we could improve samples at the cost of 10-20x longer sampling time. For example, we could sample a batch of 128 fake images, then perform SGD minimizing LLH wrt the 128 fake images. Even if we do 100 SGD steps we'd be ~num_pixels/100 times faster than autoregressive models.

    Question 3. Do you think increased network width worked better on ImageNet32 or ImageNet64? I ask because I like to think of ImageNet64 as 4 channel-wise copies of ImageNet32x32, which effectively increases the network width. I imagine if we go to 128x128 or 256x256 the network depth issue of Normalizing Flows may be smaller. What do you think?

    Question 4. Did you find batch_size=32 to optimize better for ImageNet64, or was this mainly for memory savings?

    Question 5. Did you try using gradient checkpointing?

    Question 6. Did you try training in float16?

    Question 7. How large was GPU utilization on ImageNet64 with batch_size=32? I.e., if you wrote nvidia-smi would it tell you 50% or 100%?

    opened by alexm-gc 3
  • Unable to reproduce CIFAR-10 results

    Unable to reproduce CIFAR-10 results

    Hi, I'm having a hard time reproducing the CIFAR-10 results (haven't tried others yet). I am more consistently getting around 3.1bpd on the eval set and can get around 2.9bpd on the training set. I've even tried training longer and different scheduling than is described in the appendix. Was this the best result that you got or an average? Do you have a variance? Do you have a checkpoint that could be released?

    opened by stevenwalton 1
  • Tensor dimension mis-match during evaluation and training

    Tensor dimension mis-match during evaluation and training

    Hi! I'm seeing this error when using the evaluation and train command for ImageNet32

    Loaded weights for model at 22/22 epochs Traceback (most recent call last): File "/content/DenseFlow/experiments/image/eval_loglik.py", line 72, in bpd = dataset_elbo_bpd(model, eval_loader, device=device, double=eval_args.double) File "/content/DenseFlow/denseflow/utils/loss.py", line 81, in dataset_elbo_bpd bpd += elbo_bpd(model, x).cpu().item() * len(x) File "/content/DenseFlow/denseflow/utils/loss.py", line 28, in elbo_bpd return loglik_bpd(model, x) File "/content/DenseFlow/denseflow/utils/loss.py", line 12, in loglik_bpd return - model.log_prob(x).sum() / (math.log(2) * x.shape.numel()) File "/content/DenseFlow/denseflow/flows/flow.py", line 33, in log_prob x, ldj = transform(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/content/DenseFlow/experiments/image/model/flow_modules.py", line 127, in forward eta, delta_ll = self.cross_unit_coupling(g_in, self.blocks[i-1]) File "/content/DenseFlow/experiments/image/model/flow_modules.py", line 114, in cross_unit_coupling eta_hat = mu + torch.exp(log_scale) * eta RuntimeError: The size of tensor a (64) must match the size of tensor b (16) at non-singleton dimension 3

    and with --k 200 File "/content/DenseFlow/experiments/image/eval_loglik.py", line 75, in bpd = dataset_iwbo_bpd(model, eval_loader, k=eval_args.k, kbs=eval_args.kbs, device=device, double=eval_args.double) File "/content/DenseFlow/denseflow/utils/loss.py", line 106, in dataset_iwbo_bpd bpd += iwbo_bpd(model, x, k=k, kbs=kbs).cpu().item() * len(x) File "/content/DenseFlow/denseflow/utils/loss.py", line 57, in iwbo_bpd if kbs: return - iwbo_batched(model, x, k, kbs).sum() / (x.numel() * math.log(2)) File "/content/DenseFlow/denseflow/utils/loss.py", line 44, in iwbo_batched ll_stack = model.log_prob(x_stack) File "/content/DenseFlow/denseflow/flows/flow.py", line 33, in log_prob x, ldj = transform(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/content/DenseFlow/experiments/image/model/flow_modules.py", line 127, in forward eta, delta_ll = self.cross_unit_coupling(g_in, self.blocks[i-1]) File "/content/DenseFlow/experiments/image/model/flow_modules.py", line 114, in cross_unit_coupling eta_hat = mu + torch.exp(log_scale) * eta RuntimeError: The size of tensor a (3200) must match the size of tensor b (16) at non-singleton dimension 3

    I'm able to generate samples but not get the bpd scores.

    opened by meghana17 1
  • Confused by BPD, Hardware and FID?

    Confused by BPD, Hardware and FID?

    I was comparing DenseFlow against VDM on ImageNet64x64.

    DenseFlow: 3.35 BPD, 130M, 1 V100 ~2 weeks VDM: 3.4 BPD, ?M, 128 TPUv3 for ?weeks?

    It looks like DenseFlow gets better BPD with ~100x less compute, at the cost of worse FID.

    Question 0. Do you know what the training loss of DenseFlow/VDM? I imagine low training loss leads to good FID (perhaps VDM has train=1.0 bpd and valid=3.4 bpd, while DenseFlow has train=3.3 and valid=3.4).

    Question 1. Did you make any test cases for the BPD computation? It just sounds too good to be true that we can get better BPD with 100x less compute.

    Question 2. It may be there is a trade-off between BPD and FID. That is, DenseFlow get good BPD but bad FID, while VDM gets good FID and worse BPD. Do you believe this is the case? If so, what do you believe cause this phenomenon?

    opened by alexm-gc 2
Owner
Matej Grcić
PhD Student | Research associate focused on Computer Vision @ University of Zagreb, Faculty of Electrical Engineering and Computing
Matej Grcić
EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising

EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising By Tengfei Liang, Yi Jin, Yidong Li, Tao Wang. Th

workingcoder 115 Jan 5, 2023
(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

RDPNet IEEE TIP 2021: Regularized Densely-connected Pyramid Network for Salient Instance Segmentation PyTorch training and testing code are available.

Yu-Huan Wu 41 Oct 21, 2022
Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

Jamin Fong 291 Nov 18, 2022
TensorFlow implementation of "Variational Inference with Normalizing Flows"

[TensorFlow 2] Variational Inference with Normalizing Flows TensorFlow implementation of "Variational Inference with Normalizing Flows" [1] Concept Co

YeongHyeon Park 7 Jun 8, 2022
Stochastic Normalizing Flows

Stochastic Normalizing Flows We introduce stochasticity in Boltzmann-generating flows. Normalizing flows are exact-probability generative models that

AI4Science group, FU Berlin (Frank Noé and co-workers) 50 Dec 16, 2022
Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

ArtFlow Official PyTorch implementation of the paper: ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows Jie An*, Siyu Huang*, Yibing

null 123 Dec 27, 2022
Official implementation of the paper DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows

DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows Official implementation of the paper DeFlow: Learning Complex Im

Valentin Wolf 86 Nov 16, 2022
The code release of paper Low-Light Image Enhancement with Normalizing Flow

[AAAI 2022] Low-Light Image Enhancement with Normalizing Flow Paper | Project Page Low-Light Image Enhancement with Normalizing Flow Yufei Wang, Renji

Yufei Wang 176 Jan 6, 2023
Code for "Causal autoregressive flows" - AISTATS, 2021

Code for "Causal Autoregressive Flow" This repository contains code to run and reproduce experiments presented in Causal Autoregressive Flows, present

Ricardo Pio Monti 35 Dec 16, 2022
Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS)

GANVAS-models This is an implementation of various generative models. It contains implementations of the following: Autoregressive Models: PixelCNN, G

MRSAIL (Mini Robotics, Software & AI Lab) 6 Nov 26, 2022
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition (PyTorch) Paper: https://arxiv.org/abs/2105.01883 Citation: @

null 260 Jan 3, 2023
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 101 Nov 25, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.

NVIDIA Research Projects 3.2k Dec 30, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022