Noise Conditional Score Networks (NeurIPS 2019, Oral)

Overview

Generative Modeling by Estimating Gradients of the Data Distribution

This repo contains the official implementation for the NeurIPS 2019 paper Generative Modeling by Estimating Gradients of the Data Distribution,

by Yang Song and Stefano Ermon. Stanford AI Lab.

Note: The method has been greatly stabilized by the subsequent work Improved Techniques for Training Score-Based Generative Models (code) and more recently extended by Score-Based Generative Modeling through Stochastic Differential Equations (code). This codebase is therefore not recommended for new projects anymore.


We describe a new method of generative modeling based on estimating the derivative of the log density function (a.k.a., Stein score) of the data distribution. We first perturb our training data by different Gaussian noise with progressively smaller variances. Next, we estimate the score function for each perturbed data distribution, by training a shared neural network named the Noise Conditional Score Network (NCSN) using score matching. We can directly produce samples from our NSCN with annealed Langevin dynamics.

Dependencies

  • PyTorch

  • PyYAML

  • tqdm

  • pillow

  • tensorboardX

  • seaborn

Running Experiments

Project Structure

main.py is the common gateway to all experiments. Type python main.py --help to get its usage description.

usage: main.py [-h] [--runner RUNNER] [--config CONFIG] [--seed SEED]
               [--run RUN] [--doc DOC] [--comment COMMENT] [--verbose VERBOSE]
               [--test] [--resume_training] [-o IMAGE_FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  --runner RUNNER       The runner to execute
  --config CONFIG       Path to the config file
  --seed SEED           Random seed
  --run RUN             Path for saving running related data.
  --doc DOC             A string for documentation purpose
  --verbose VERBOSE     Verbose level: info | debug | warning | critical
  --test                Whether to test the model
  --resume_training     Whether to resume training
  -o IMAGE_FOLDER, --image_folder IMAGE_FOLDER
                        The directory of image outputs

There are four runner classes.

  • AnnealRunner The main runner class for experiments related to NCSN and annealed Langevin dynamics.
  • BaselineRunner Compared to AnnealRunner, this one does not anneal the noise. Instead, it uses a single fixed noise variance.
  • ScoreNetRunner This is the runner class for reproducing the experiment of Figure 1 (Middle, Right)
  • ToyRunner This is the runner class for reproducing the experiment of Figure 2 and Figure 3.

Configuration files are stored in configs/. For example, the configuration file of AnnealRunner is configs/anneal.yml. Log files are commonly stored in run/logs/doc_name, and tensorboard files are in run/tensorboard/doc_name. Here doc_name is the value fed to option --doc.

Training

The usage of main.py is quite self-evident. For example, we can train an NCSN by running

python main.py --runner AnnealRunner --config anneal.yml --doc cifar10

Then the model will be trained according to the configuration files in configs/anneal.yml. The log files will be stored in run/logs/cifar10, and the tensorboard logs are in run/tensorboard/cifar10.

Sampling

Suppose the log files are stored in run/logs/cifar10. We can produce samples to folder samples by running

python main.py --runner AnnealRunner --test -o samples

Checkpoints

We provide pretrained checkpoints run.zip. Extract the file to the root folder. You should be able to produce samples like the following using this checkpoint.

Dataset Sampling procedure
MNIST MNIST
CelebA Celeba
CIFAR-10 CIFAR10

Evaluation

Please refer to Appendix B.2 of our paper for details on hyperparameters and model selection. When computing inception and FID scores, we first generate images from our model, and use the official code from OpenAI and the original code from TTUR authors to obtain the scores.

References

Large parts of the code are derived from this Github repo (the official implementation of the sliced score matching paper)

If you find the code / idea inspiring for your research, please consider citing the following

@inproceedings{song2019generative,
  title={Generative Modeling by Estimating Gradients of the Data Distribution},
  author={Song, Yang and Ermon, Stefano},
  booktitle={Advances in Neural Information Processing Systems},
  pages={11895--11907},
  year={2019}
}

and / or

@inproceedings{song2019sliced,
  author    = {Yang Song and
               Sahaj Garg and
               Jiaxin Shi and
               Stefano Ermon},
  title     = {Sliced Score Matching: {A} Scalable Approach to Density and Score
               Estimation},
  booktitle = {Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial
               Intelligence, {UAI} 2019, Tel Aviv, Israel, July 22-25, 2019},
  pages     = {204},
  year      = {2019},
  url       = {http://auai.org/uai2019/proceedings/papers/204.pdf},
}
Comments
  • Why need adding noise twice

    Why need adding noise twice

    Hi,

    Thank you for the paper and code.

    I notice that when training with the annealing strategy, the samples is corrupted twice -- the first one is here, and the second one is here. In my understanding, the second one corresponds to { \sigma_1, ... \sigma_L } described in the paper. May I ask why the first one is needed?

    Thank you and look forward to your reply!

    opened by zhikaili 2
  • About the condition in conditional score network

    About the condition in conditional score network

    Hi, thanks for your great work and the code sharing! I have a little confusion after I trace the code. In the original paper, the objective of eq(2) can be expand as eq(5) under the denoising score matching scenario. In the equation, the score network produce gradient of logarithmic data density , which condition on the sample and noise scale . We can see the snippet in the following (in page 6)

    image

    However, we can notice that the score network compute score which condition on data sample and label. here is the code of training. So which one is correct?

    By the way, this is not a big issue :-) Since new noise conditioning technique is introduced in latter version (NCSNv2), the score network is only condition on sample . Just want to realize the correctness of old version.

    opened by SunnerLi 2
  • Baseline not converge

    Baseline not converge

    I run the baseline model with the default setting, but it seems not converged (cifar10), loss is huge, around 6600000. Is this normal, or just mine this? If it is normal, why is this? Based on my research, traditional langevin dynamics can easily converge with restricted number of steps, such as 50. I'm quite curious why the author set the number of this sampling process to 1000, with a relatively small learning rate.

    Thanks for your excellent work, the anneal model converges well, by the way, i just try to figure out why this method works. Can you please tell us?

    opened by uiyo 1
  • Custom dataset

    Custom dataset

    Hi thanks for the great work. Because I couldn't find any information about custom dataset, I wonder if this repo works for training and generating using custom dataset? If yes, do I have to convert my images to (for example) Cifar10 format?

    opened by ivder 1
  • Model in eval forever after the first log on validation

    Model in eval forever after the first log on validation

    Hi, and thank you for your work.

    I think there may be a bug in the anneal_runner, due to the log of dsm loss on validation. Specifically, the model is put in eval() in the following line

    https://github.com/ermongroup/ncsn/blob/c4654a258e54a7da0c916132cebdd0818fae8fd2/runners/anneal_runner.py#L158

    and it is never put in train() again.

    Best, D

    opened by DavideA 1
  • A question on convergence of DSM

    A question on convergence of DSM

    Hey,

    I'm currently tracing out the story of diffusion generative models and right now, I'm studying the denoising score matching objective (DSM). I've noticed that your multi-scale approach relies heavily on it (and the original paper is quite old), so I decided that to ask my question here.

    I gone through the theory of DSM and got a good grip on how it works and why it works. However, in practice I observe slow convergence (much slower convergence than with ISM) on toy examples. In particular I believe this might be due type of noise distribution selected. While not restricted, it seems everyone goes with a normal distribution since it provides a simple derivative. The derivate being 1/sigma**2 * (orig-perturbed). In practive, I've observed that the scale term in front causes the derivative to take values on the order of 1e4 for sigma=1e-2 and loss jumps around quite heavily. The smaller sigma, the slower the convergence. The loss never actually decreases, but the resulting gradient field looks comparable to what ISM gives.

    Did you observe this in your experiments as well?

    opened by cheind 5
  • Annealed GMM Analytical Log Probabilities

    Annealed GMM Analytical Log Probabilities

    This may be a bit pedantic but, I wonder if there is a bug in the way the log probabilities are calculated for the annealed GMM distribution. My understanding is that annealed sampling uses scores from the ‘noise distribution’ which should reflect a Gaussian perturbed input. I’m assuming you guys achieve this by adding the noise distribution to the original GMM, hence allowing you to calculate the true log probabilities from the same GMM now scaled with the noise sigmas.

    However, adding two Gaussian random variables should add the variances from both distributions as shown (taken from Wikipedia)

    Annotation 2020-09-10 225227

    In this case sigma_x would be the original GMM sigma (which is set to 1), and sigma_y would correspond to the noise level. So shouldn't the calculation in the lines below be using something like sigma = np.sqrt(sigma**2 + self.sigma**2) instead of just the sigma passed in to the function? In my own tests, sampling with these updated sigmas gives more faithful results to the original distribution.

    https://github.com/ermongroup/ncsn/blob/adb98fb5c9533b79933bfee0b3e883dc33f84e7a/models/gmm.py#L45-L46

    opened by ahsanMah 0
  • Error loading pre-trained checkpoints

    Error loading pre-trained checkpoints

    Hi,

    Thank you for your great work. I downloaded the checkpoints on Google drive and tried to do sampling, but I got this error:

    RuntimeError: Error(s) in loading state_dict for DataParallel:
    	Missing key(s) in state_dict: "module.normalizer.alpha", "module.normalizer.gamma", "module.normalizer.beta", "module.res1.0.normalize2.alpha", "module.res1.0.normalize2.gamma"
    ....
    

    I was wondering how you and others load the model wrapped with DataParallel? I can train the model from scratch and fix the saving but wanted to check if there is better workaround.

    opened by thanh-isu 1
Owner
null
PyTorch implementation of SmoothGrad: removing noise by adding noise.

SmoothGrad implementation in PyTorch PyTorch implementation of SmoothGrad: removing noise by adding noise. Vanilla Gradients SmoothGrad Guided backpro

SSKH 143 Jan 5, 2023
PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains a PyTorch implementation for the paper Score-Based Genera

Yang Song 757 Jan 4, 2023
Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

Xavier 33 Oct 12, 2022
This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

Jiaqi Wang 42 Jan 7, 2023
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

Zhengyuan Yang 118 Dec 5, 2022
Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

Zhongqi Miao 761 Dec 26, 2022
Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Good news! We release a clean version of PVNet: clean-pvnet, including how to train the PVNet on the custom dataset. Use PVNet with a detector. The tr

ZJU3DV 722 Dec 27, 2022
NR-GAN: Noise Robust Generative Adversarial Networks

NR-GAN: Noise Robust Generative Adversarial Networks (CVPR 2020) This repository provides PyTorch implementation for noise robust GAN (NR-GAN). NR-GAN

Takuhiro Kaneko 59 Dec 11, 2022
Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Group-CAM By Zhang, Qinglong and Rao, Lu and Yang, Yubin [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the o

zhql 98 Nov 16, 2022
Implementation of "Fast and Flexible Temporal Point Processes with Triangular Maps" (Oral @ NeurIPS 2020)

Fast and Flexible Temporal Point Processes with Triangular Maps This repository includes a reference implementation of the algorithms described in "Fa

Oleksandr Shchur 20 Dec 2, 2022
Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

null 87 Oct 19, 2022
Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

pix2pix-keras Pix2pix implementation in keras. Original paper: Image-to-Image Translation with Conditional Adversarial Networks (pix2pix) Paper Author

William Falcon 141 Dec 30, 2022
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

null 3k Jan 8, 2023
PyTorch implementation of "Image-to-Image Translation Using Conditional Adversarial Networks".

pix2pix-pytorch PyTorch implementation of Image-to-Image Translation Using Conditional Adversarial Networks. Based on pix2pix by Phillip Isola et al.

mrzhu 383 Dec 17, 2022
[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

FiGNN for CTR prediction The code and data for our paper in CIKM2019: Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Predicti

Big Data and Multi-modal Computing Group, CRIPAC 75 Dec 30, 2022
《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

A-CNN: Annularly Convolutional Neural Networks on Point Clouds Created by Artem Komarichev, Zichun Zhong, Jing Hua from Department of Computer Science

Artёm Komarichev 44 Feb 24, 2022
A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

ClusterGCN ⠀⠀ A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). A

Benedek Rozemberczki 697 Dec 27, 2022
A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

APPNP ⠀ A PyTorch implementation of Predict then Propagate: Graph Neural Networks meet Personalized PageRank (ICLR 2019). Abstract Neural message pass

Benedek Rozemberczki 329 Dec 30, 2022
A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

SlowFast A PyTorch implementation of SlowFast based on ICCV 2019 paper SlowFast Networks for Video Recognition. Requirements Anaconda PyTorch conda in

Hao Ren 8 Dec 23, 2022