Learning Energy-Based Models by Diffusion Recovery Likelihood

Overview

Learning Energy-Based Models by Diffusion Recovery Likelihood

Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P. Kingma

Paper: https://arxiv.org/pdf/2012.08125

Samples generated by our model

Requirements

Experiments can be run on a single GPU or Google Cloud TPU v3-8. Requires python >= 3.5. To install dependencies:

pip install -r requirements.txt

To compute FID/inception scores, download the pre-computed statistics of datasets from: https://drive.google.com/file/d/1QOLyYHESflcdZu8CsBLZohZzC95HyukK/view?usp=sharing, unzip the file and put the folder in this repo.

Train with 1 GPU

CIFAR10
python main.py --num_res_blocks=8 --n_batch_train=256 
CelebA
python main.py --problem=celeba --num_res_blocks=6 --beta_1=0.5 --batch_size=128
LSUN church_outdoor 64x64 / LSUN bedroom 64x64
python main.py --problem=[lsun_church64/lsun_bedroom64] --batch_size=128
LSUN church_outdoor 128x128
python main.py --problem=lsun_church128 --beta_1=0.5
LSUN bedroom 128x128
python main.py --problem=lsun_bedroom128 --beta_1=0.5 --num_res_blocks=5
Compute full FID / IS scores after training on CIFAR10
python main.py --eval --num_res_blocks=8 --noise_scale=0.99 --fid_n_batch=2000

For faster training, reduce the value of num_res_blocks.

Train with Google Cloud TPU

Add --tpu=True to the above scripts for 1 GPU. Also need to set --tpu_name and --tpu_zone as shown in Google Cloud.

Pretrained models

https://drive.google.com/file/d/1eneA6T5jQIyVFLFSOrSfJvDeUJJMh9xk/view?usp=sharing

This code is for T6 setting. Will upload T1k setting soon!

Citation

If you find our work helpful to your research, please cite:

@article{gao2020learning,
  title={Learning Energy-Based Models by Diffusion Recovery Likelihood},
  author={Gao, Ruiqi and Song, Yang and Poole, Ben and Wu, Ying Nian and Kingma, Diederik P},
  journal={arXiv preprint arXiv:2012.08125},
  year={2020}
}
Comments
  • How long does it take to train on CIFAR10 using a single GPU/TPU?

    How long does it take to train on CIFAR10 using a single GPU/TPU?

    Hi, this is really an excellent paper and code! I can run it and I found that it takes about 1 hour (time=4258.89s) to train 500 iterations on a single GPU with num_res_blocks=5, n_batch_train=256.

    In the paper, you said you train CIFAR-10 240k iterations(final version) or 50k (under review version). So I wonder the total training time you need to finish the training for CIFAR10. Thanks so much!

    opened by sndnyang 1
  • Can you explain why the energy is divided by b0?

    Can you explain why the energy is divided by b0?

    In the following code for computing the (unnormalized) log probability, the network output is divided by b0.

    https://github.com/ruiqigao/recovery_likelihood/blob/c77cc0511dedcb8d9ab928438d80acb62aeca96f/model.py#L154

    I wonder if there is a legitimate explanation for this division.

    b0 is supposed to be step_size_square, which usually has a very small value. https://github.com/ruiqigao/recovery_likelihood/blob/c77cc0511dedcb8d9ab928438d80acb62aeca96f/model.py#L184

    I wonder if dividing by this b0 makes the gradient too large and harms the training in some settings.

    opened by swyoon 0
  • How to train with multi GPUs?

    How to train with multi GPUs?

    Hello! I tried to train the model with multi GPUs. And I found that you have released train_distributed.py So I tried to use tf.distribute.MirroredStrategy() as strategy to achieve distributed training. But I got an error as follows:

     RuntimeError: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.experimental_run()` is decorated with `@tf.function` (or contains a nested `@tf.function`), and `fn` contains a synchronization point, such as aggregating gradients. This behavior is not yet supported. Instead, please wrap the entire call `strategy.experimental_run(fn)` in a `@tf.function`, and avoid nested `tf.function`s that may potentially cross a synchronization boundary.
    

    Looking forward to your help!!

    opened by HoJ-Onle 0
  • Some Errors

    Some Errors

    I met an error when I tried to eval the model. I don't know what I should do

    ValueError: in user code: /opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:1947 split ** axis=axis, num_split=num_or_size_splits, value=value, name=name) /opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py:9723 split "Split", split_dim=axis, value=value, num_split=num_split, name=name) /opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:744 _apply_op_helper attrs=attr_protos, op_def=op_def) /opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:595 _create_op_internal compute_device) /opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3327 _create_op_internal op_def=op_def) /opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1817 init control_input_ops, op_def) /opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1657 _create_c_op raise ValueError(str(e))

    ValueError: Dimension size must be evenly divisible by 3 but is 64
        Number of ways to split should evenly divide the split dimension for '{{node split}} = Split[T=DT_UINT8, num_split=3](split/split_dim, input_tensor)' with input shapes: [], [64,32,32,3] and with computed input tensors: input[0] = <0>.
    
    opened by HoJ-Onle 0
  • Code implementation for the NLL metric

    Code implementation for the NLL metric

    Hi,

    Thanks a lot for your amazing work. I'm recently reproducing your work, but I find that the code implementation for calculating the NLL metric (i.e., Table 4 in the paper) is missing. Will you release the code for evaluating the NLL metric? Or could you provide a pseudo code for reference?

    Thanks.

    opened by chen-hao-chao 0
  • It seems not converge when setting K=30, T=6

    It seems not converge when setting K=30, T=6

    Hello, I have cloned your code and run it with the default settings from config.py. However, it still converges with high negative contrastive losses and nan grads. Here present the log.

    output.log

    Logging ........ 2021-07-16 20:11:02,055 : gpus=0 2021-07-16 20:11:02,056 : {'logtostderr': False, 'alsologtostderr': False, 'log_dir': '', 'v': 0, 'verbosity': 0, 'logger_levels': {}, 'stderrthreshold': 'fatal', 'showprefixforinfo': True, 'run_with_pdb': False, 'pdb_post_mortem': False, 'pdb': None, 'run_with_profiling': False, 'profile_file': None, 'use_cprofile_for_profiling': True, 'only_check_args': False, 'runtime_oom_exit': True, 'op_conversion_fallback_to_while_loop': False, 'test_srcdir': '', 'test_tmpdir': '/tmp/absl_testing', 'test_random_seed': 301, 'test_randomize_ordering_seed': '', 'xml_output_file': '', 'jobid': 0, 'logdir': '', 'eager': False, 'ckpt_load': None, 'device': '0', 'tpu': False, 'tpu_name': None, 'tpu_zone': None, 'rnd_seed': 1, 'problem': 'cifar10', 'n_batch_train': 64, 'lr': 0.0001, 'beta_1': 0.9, 'n_iters': 1000000, 'grad_clip': False, 'warmup': 1000, 'n_batch_per_iter': 1, 'cosine_decay': False, 'opt': 'adam', 'eval': False, 'include_xpred_freq': 1, 'eval_fid': False, 'fid_n_samples': 64, 'fid_n_iters': 40000, 'fid_n_batch': 64, 'num_res_blocks': 8, 'num_diffusion_timesteps': 6, 'randflip': True, 'dropout': 0.0, 'normalize': None, 'act': 'lrelu', 'final_act': 'relu', 'use_attention': False, 'resamp_with_conv': False, 'spec_norm': True, 'res_conv_shortcut': True, 'res_use_scale': True, 'ma_decay': 0.999, 'noise_scale': 1.0, 'mcmc_num_steps': 30, 'mcmc_step_size_b_square': 0.0002, 'tfhub_cache_dir': None, 'tfhub_model_load_format': 'AUTO', '?': False, 'help': False, 'helpshort': False, 'helpfull': False, 'helpxml': False, 'output': './output/main/2021-07-16-20-10-55--num_res_blocks=8--n_batch_train=64'} 2021-07-16 20:11:02,238 : output dir ./output/main/2021-07-16-20-10-55--num_res_blocks=8--n_batch_train=64 2021-07-16 20:40:31,668 : ========== begin training ========= 2021-07-16 20:42:38,251 : dir=2021-07-16-20-10-55--num_res_blocks=8--n_batch_train=64 i= 0 loss=-2228.8938 learning grads mean= nan grads max=448971.1250 disp=2.339, 15.666, 28.026, 40.882, 53.527, 42.221, 42.221 loss_ts=19446.877, 16269.05, 13148.938, 11828.364, 11097.337, 5871.38, 5871.38 f_ts=17548.744, 19199.219, 23628.014, 21098.88, 26118.945, 14722.812, 14722.812 is_accepted_ts= 0.0000 lr=0.00000010 time=126.57s 2021-07-16 20:43:13,702 : early exit due to nan 2021-07-16 20:43:13,703 : done

    Can you tell me how to reinterpret the result in the paper? Thanks.

    opened by ljrprocc 1
Owner
Ruiqi Gao
Ph.D student at VCLA@UCLA. Research interest is machine learning, computer vision and artificial intelligence.
Ruiqi Gao
Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

Minimal code and simple experiments to play with Denoising Diffusion Probabilist

Rithesh Kumar 16 Oct 6, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 7, 2023
Pcos-prediction - Predicts the likelihood of Polycystic Ovary Syndrome based on patient attributes and symptoms

PCOS Prediction ?? Predicts the likelihood of Polycystic Ovary Syndrome based on

Samantha Van Seters 1 Jan 10, 2022
💡 Learnergy is a Python library for energy-based machine learning models.

Learnergy: Energy-based Machine Learners Welcome to Learnergy. Did you ever reach a bottleneck in your computational experiments? Are you tired of imp

Gustavo Rosa 57 Nov 17, 2022
PyTorch code accompanying our paper on Maximum Entropy Generators for Energy-Based Models

Maximum Entropy Generators for Energy-Based Models All experiments have tensorboard visualizations for samples / density / train curves etc. To run th

Rithesh Kumar 135 Oct 27, 2022
Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

Xavier 33 Oct 12, 2022
Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

JeffLi 347 Dec 24, 2022
Github for the conference paper GLOD-Gaussian Likelihood OOD detector

FOOD - Fast OOD Detector Pytorch implamentation of the confernce peper FOOD arxiv link. Abstract Deep neural networks (DNNs) perform well at classifyi

null 17 Jun 19, 2022
The Multi-Mission Maximum Likelihood framework (3ML)

PyPi Conda The Multi-Mission Maximum Likelihood framework (3ML) A framework for multi-wavelength/multi-messenger analysis for astronomy/astrophysics.

The Multi-Mission Maximum Likelihood (3ML) 62 Dec 30, 2022
Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM)

Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM) Introduction The average lifetime of the $D^{0}$ me

Son Gyo Jung 1 Dec 17, 2021
DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

Denys Rozumnyi 139 Dec 26, 2022
HandTailor: Towards High-Precision Monocular 3D Hand Recovery

HandTailor This repository is the implementation code and model of the paper "HandTailor: Towards High-Precision Monocular 3D Hand Recovery" (arXiv) G

Lv Jun 113 Jan 6, 2023
[ICCV'21] PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery This is the official implementation of our ICCV 2021 paper News There maybe some bugs in

null 73 Nov 30, 2022
Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds

Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds Xinxin Zuo, Sen Wang, Minglun Gong, Li Cheng Prerequisites We have tested the code on Ubun

null 41 Dec 12, 2022
(ICCV 2021) ProHMR - Probabilistic Modeling for Human Mesh Recovery

ProHMR - Probabilistic Modeling for Human Mesh Recovery Code repository for the paper: Probabilistic Modeling for Human Mesh Recovery Nikos Kolotouros

Nikos Kolotouros 209 Dec 13, 2022
the code for paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration"

EOW-Softmax This code is for the paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration". Accepted by ICCV21. Usage Commnd exa

Yezhen Wang 36 Dec 2, 2022
Energy consumption estimation utilities for Jetson-based platforms

This repository contains a utility for measuring energy consumption when running various programs in NVIDIA Jetson-based platforms. Currently TX-2, NX, and AGX are supported.

OpenDR 10 Jun 17, 2022
This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

OpenAI 3k Dec 26, 2022
Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official PyTorch implementation for "On Fast Sampling of Diffusion Probabilistic Models". FastDPM generation on CIFAR-10, CelebA, and LSUN datasets. S

Zhifeng Kong 68 Dec 26, 2022