Ensembling Off-the-shelf Models for GAN Training

MIT HAN Lab

Last update: Dec 26, 2022

Related tags

Deep Learning tensorflow pytorch generative-adversarial-network image-generation gans data-efficient neurips-2020

Overview

Data-Efficient GANs with DiffAugment

project | paper | datasets | video | slides

Generated using only 100 images of Obama, grumpy cats, pandas, the Bridge of Sighs, the Medici Fountain, the Temple of Heaven, without pre-training.

[NEW!] PyTorch training with DiffAugment-stylegan2-pytorch is now available!

[NEW!] Our Colab tutorial is released!

[NEW!] FFHQ training is supported! See the DiffAugment-stylegan2 README.

[NEW!] Time to generate 100-shot interpolation videos with generate_gif.py!

[NEW!] Our DiffAugment-biggan-imagenet repo (for TPU training) is released!

[NEW!] Our DiffAugment-biggan-cifar PyTorch repo is released!

This repository contains our implementation of Differentiable Augmentation (DiffAugment) in both PyTorch and TensorFlow. It can be used to significantly improve the data efficiency for GAN training. We have provided DiffAugment-stylegan2 (TensorFlow) and DiffAugment-stylegan2-pytorch, DiffAugment-biggan-cifar (PyTorch) for GPU training, and DiffAugment-biggan-imagenet (TensorFlow) for TPU training.

Low-shot generation without pre-training. With DiffAugment, our model can generate high-fidelity images using only 100 Obama portraits, grumpy cats, or pandas from our collected 100-shot datasets, 160 cats or 389 dogs from the AnimalFace dataset at 256×256 resolution.

Unconditional generation results on CIFAR-10. StyleGAN2’s performance drastically degrades given less training data. With DiffAugment, we are able to roughly match its FID and outperform its Inception Score (IS) using only 20% training data.

Differentiable Augmentation for Data-Efficient GAN Training
Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han
MIT, Tsinghua University, Adobe Research, CMU
arXiv

Overview

Overview of DiffAugment for updating D (left) and G (right). DiffAugment applies the augmentation T to both the real sample x and the generated output G(z). When we update G, gradients need to be back-propagated through T (iii), which requires T to be differentiable w.r.t. the input.

Training and Generation with 100 Images

To generate an interpolation video using our pre-trained models:

cd DiffAugment-stylegan2
python generate_gif.py -r mit-han-lab:DiffAugment-stylegan2-100-shot-obama.pkl -o obama.gif

or to train a new model:

python run_low_shot.py --dataset=100-shot-obama --num-gpus=4

You may also try out 100-shot-grumpy_cat, 100-shot-panda, 100-shot-bridge_of_sighs, 100-shot-medici_fountain, 100-shot-temple_of_heaven, 100-shot-wuzhen, or the folder containing your own training images. Please refer to the DiffAugment-stylegan2 README for the dependencies and details.

[NEW!] PyTorch training is now available:

cd DiffAugment-stylegan2-pytorch
python train.py --outdir=training-runs --data=https://data-efficient-gans.mit.edu/datasets/100-shot-obama.zip --gpus=1

DiffAugment for StyleGAN2

To run StyleGAN2 + DiffAugment for unconditional generation on the 100-shot datasets, CIFAR, FFHQ, or LSUN, please refer to the DiffAugment-stylegan2 README or DiffAugment-stylegan2-pytorch for the PyTorch version.

DiffAugment for BigGAN

Please refer to the DiffAugment-biggan-cifar README to run BigGAN + DiffAugment for conditional generation on CIFAR (using GPUs), and the DiffAugment-biggan-imagenet README to run on ImageNet (using TPUs).

Using DiffAugment for Your Own Training

To help you use DiffAugment in your own codebase, we provide portable DiffAugment operations of both TensorFlow and PyTorch versions in DiffAugment_tf.py and DiffAugment_pytorch.py. Generally, DiffAugment can be easily adopted in any model by substituting every D(x) with D(T(x)), where x can be real images or fake images, D is the discriminator, and T is the DiffAugment operation. For example,

from DiffAugment_pytorch import DiffAugment
# from DiffAugment_tf import DiffAugment
policy = 'color,translation,cutout' # If your dataset is as small as ours (e.g.,
# hundreds of images), we recommend using the strongest Color + Translation + Cutout.
# For large datasets, try using a subset of transformations in ['color', 'translation', 'cutout'].
# Welcome to discover more DiffAugment transformations!

...
# Training loop: update D
reals = sample_real_images() # a batch of real images
z = sample_latent_vectors()
fakes = Generator(z) # a batch of fake images
real_scores = Discriminator(DiffAugment(reals, policy=policy))
fake_scores = Discriminator(DiffAugment(fakes, policy=policy))
# Calculating D's loss based on real_scores and fake_scores...
...

...
# Training loop: update G
z = sample_latent_vectors()
fakes = Generator(z) # a batch of fake images
fake_scores = Discriminator(DiffAugment(fakes, policy=policy))
# Calculating G's loss based on fake_scores...
...

We have implemented Color, Translation, and Cutout DiffAugment as visualized below:

Citation

If you find this code helpful, please cite our paper:

@inproceedings{zhao2020diffaugment,
  title={Differentiable Augmentation for Data-Efficient GAN Training},
  author={Zhao, Shengyu and Liu, Zhijian and Lin, Ji and Zhu, Jun-Yan and Han, Song},
  booktitle={Conference on Neural Information Processing Systems (NeurIPS)},
  year={2020}
}

Acknowledgements

We thank NSF Career Award #1943349, MIT-IBM Watson AI Lab, Google, Adobe, and Sony for supporting this research. Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC). We thank William S. Peebles and Yijun Li for helpful comments.

Comments

OOM and CUDA Error's

Hi,

I am getting an OOM error on Colab (P-100 16 GB RAM) with the following:

cd DiffAugment-stylegan2
python run_few_shot.py --dataset=100-shot-obama --num-gpus=1

Traceback (most recent call last):
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[32,128,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node GPU0/loss/D_1/256x256/Conv0/FusedBiasAct}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[TrainG/Apply0/cond_111/pred_id/_2541]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[32,128,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node GPU0/loss/D_1/256x256/Conv0/FusedBiasAct}}]]

So I tried it on 8xV-100's. It gave an OOM error with my dataset at 1024 but the obama dataset reached till below and then gave another error.

tick 0     kimg 0.1      lod 0.00  minibatch 32   time 49s          sec/tick 49.1    sec/kimg 383.77  maintenance 0.0    gpumem 6.3
Downloading http://d36zk2xti64re0.cloudfront.net/stylegan1/networks/metrics/inception_v3_features.pkl ... done
network-snapshot-000000        time 3m 09s       fid5k-train 396.6058
2020-07-02 13:34:00.935725: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2020-07-02 13:34:00.935780: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1
Aborted (core dumped)

CUDA details:

What is the max resolution supported on 16 GB RAM? Sorry to mix two issues. I can open a separate issue for the CUDA error if needed.

opened by atugar 11

policy=“translation” is failed

Hi， I test you code with two framework, and two dataformat, and the same result is "translation" policy is failed!

Envs: Python: 3.7 Ubuntu: 1804 CUDA 10.1 Pytorch: 1.4.0

Test1:

I use your code in BMSG-GAN repo and train 3-channel dataset(102 -flower) but only using 755 samples for checking, when the policy included "translation" , and failed to train.

RuntimeError: CuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input

but if policy I just used "color,cut", it work fine!

Test2:

In my own GAN with 1-channel dataset, the performance is the same, if not use "translation" policy it work fine, otherwise it failed

opened by Johnson-yue 10
the D and G values for updating

Hi, would you mind to please help on how can I can find where exactly in your code, the D(T(x)), D(T(G(z))) from i, ii, and D(T(G(z)))from iii value are measured?

opened by Mshz2 9
Resume training progress and see images created every tick

I am training my custom dataset with run_low_shot.py on Google Colab. How can I resume my training progress and see the image created every tick. I saw the training_loop.py scripts but I don't know how to implement it

opened by iumyx2612 9
FID increases for larger training datasets

I trained a generator using 3.8K images on 256x256 following the settings described and trained another generator using 750 images with the same settings but with regularization increased from 1 to 10 as in the paper.

While the training FID-5K score is lower when training on 3.8K vs 750 (both training FID curves are stable), I find that the the FID score computed using the generator trained on 3.8K gives much higher FID scores compared to that attained under 750.

I am using the same code to compute the FID scores so it is unclear to me why this is the case. Have you encountered this issue?

opened by slala2121 9
Q: Why not use adjust_brightness of torchvision?

Why not use functions from torchvision instead of writing you own? Is it because adjust_brightness, adjust_saturation and adjust_contrast of torchvision are not differentiable? Because I thought they were. Thanks for your answer. :)

opened by alexrey88 8
Question about Training Time

Hi,

Great good. Congrats.

I would like to train your model on this fruits dataset: https://www.kaggle.com/moltean/fruits

It has about 80K images, 100 classes.

How long do you think it would take to train on Colab GANs for all classes? Does the model train one class at a time or does it train multiple classes all at once?

Thanks for the help and great work!

opened by accountspark-bo-peng 8
Comparison with the 3 other DiffAugment papers published

The core insight of this paper, that doing data augmentation on the reals and fakes while training D, has been recently published by (at least) 3 other papers: Zhao, Tran, and Karras (in that chronological order). A comparison and contrast with the differing results in those papers would be very useful for the README & future versions of this paper.

In particular, I would like to know: did you simply disable path length regularization in StyleGAN2 rather than work around the higher-order gradient issues? Why do you think your D-only augmentation diverged when Zhao (the first) does all their experiments with only D augmentation without any issue at all? Did you experiment with stronger or weaker settings for each data augmentation to understand if the stack of multiple data augmentations is collectively too weak or too strong? Also, one part of the paper seems ambiguous: how exactly are the data augmentations done - does it pick one augmentation at random per batch, one augmentation per image, or does it apply all 1/2/3 augmentations to each image as a stack? The paper seems to suggest, given the emphasis on strong augmentation, that it's applying as a stack, but it never actually seems to say (and looking at the source code didn't help).

opened by gwern 8
Better FID with smaller batch size

Hi, thanks for the quick release of the code. The following is not an issue, but an observation I made while playing around with the code. If we keep everything the same, and simply reduce the batch size to 16 (default is 32), the FID for the Obama dataset improves from 54.39 (reported in the paper) to 47.0032. Was there a trend with variation in batch size that the authors observed in the scenario of few-shot generation?

opened by utkarshojha 7

Evaluating discriminator accuracy

How did you measure the discriminator training/validation accuracy (as in Fig. 6 from the paper)? When I evaluated on the real images, I find that the discriminator generally classifies them as fake. On the synthesized images, the discriminator classifies them as fake as well.

Below is an example of how I'm computing the accuracy on the real data:

`with tf.device('/gpu:0'): print('Constructing networks...')

        D_neg = tflib.Network('D', num_channels=num_channels, resolution=resolution, label_size=label_size, **D_args)
        resume_networks = misc.load_pkl(resume_pkl_neg)
        rG, rD, rGs = resume_networks
        D_neg.copy_vars_from(rD)

        dl=torch.utils.data.DataLoader(neg_dataset, batch_size=batch_size)
        d_logits_neg=[]
        for batch_index, (img, label) in enumerate(dl):
            # d_out_neg = D_neg.run(img.numpy(), is_training=False, minibatch_size=batch_size)
            if img.shape[0] % batch_size !=0:
                break
            with sess.as_default():
                d_out_neg = D_neg.get_output_for(tf.convert_to_tensor(img.numpy()), is_training=False).eval()
            d_logits_neg.extend(list(d_out_neg))
        d_scores_neg=sigmoid(torch.tensor(d_logits_neg))
        num_correct_neg=len(np.where(d_scores_neg>=0.5)[0])
        d_acc=num_correct_neg*1.0/len(neg_dataset)

Thanks.

opened by slala2121 6

AttributeError: module 'tensorflow.compat.v2'has no attribute 'contrib'

When trying to train the model with a custom dataset,There was an error. Checking the information found that the TensorFlow version needs to be upgraded, but you need version 1.14. Is this a contradiction?My installed version is 1.14。

opened by hzc5095 6
generate_gif.py error:Cannot handle this data type:(1,1,64),|u1

I used the generate_gif.py file to generate my own 64*64 grayscale data, but it reports an error: Cannot handle this data type:(1,1,64),|u1, how do I modify the file? Thank you!

opened by Jordan-Liao 0
The difference between the generated image and the training image is too large

train_set: generated images:

@click.option('--outdir',default='/work/ai_lab/miner/data-efficient-gans/output', help='Where to save the results', required=True, metavar='DIR') @click.option('--gpus',default=2, help='Number of GPUs to use [default: 1]', type=int, metavar='INT') @click.option('--snap',default=10, help='Snapshot interval [default: 50 ticks]', type=int, metavar='INT') @click.option('--metrics', help='Comma-separated list or "none" [default: fid50k_full]', type=CommaSeparatedList()) @click.option('--seed', help='Random seed [default: 0]', type=int, metavar='INT') @click.option('-n', '--dry-run', help='Print training options and exit', is_flag=True)

Dataset.

@click.option('--data',default=r'/work/ai_lab/miner/data-efficient-gans/bicycle', help='Training data (directory or zip)', metavar='PATH', required=True) @click.option('--cond', help='Train conditional model based on dataset labels [default: false]', type=bool, metavar='BOOL') @click.option('--subset', help='Train with only N images [default: all]', type=int, metavar='INT') @click.option('--mirror',default=True, help='Enable dataset x-flips [default: false]', type=bool, metavar='BOOL')

Base config.

@click.option('--cfg', help='Base config [default: low_shot]', type=click.Choice(['low_shot', 'auto', 'stylegan2', 'paper256', 'paper512', 'paper1024', 'cifar'])) @click.option('--gamma', help='Override R1 gamma', type=float) @click.option('--kimg', help='Override training duration', type=int, metavar='INT') @click.option('--batch',default=8, help='Override batch size', type=int, metavar='INT')

Discriminator augmentation.

@click.option('--DiffAugment', help='Comma-separated list of DiffAugment policy [default: color,translation,cutout]', type=str) @click.option('--aug', help='Augmentation mode [default: ada]', type=click.Choice(['noaug', 'ada', 'fixed'])) @click.option('--p', help='Augmentation probability for --aug=fixed', type=float) @click.option('--target', help='ADA target value for --aug=ada', type=float) @click.option('--augpipe', help='Augmentation pipeline [default: bgc]', type=click.Choice(['blit', 'geom', 'color', 'filter', 'noise', 'cutout', 'bg', 'bgc', 'bgcf', 'bgcfn', 'bgcfnc']))

Transfer learning.

@click.option('--resume', help='Resume training [default: noresume]', metavar='PKL') @click.option('--freezed', help='Freeze-D [default: 0 layers]', type=int, metavar='INT')

Performance options.

@click.option('--fp32', help='Disable mixed-precision training', type=bool, metavar='BOOL') @click.option('--nhwc', help='Use NHWC memory format with FP16', type=bool, metavar='BOOL') @click.option('--nobench', help='Disable cuDNN benchmarking', type=bool, metavar='BOOL') @click.option('--allow-tf32', help='Allow PyTorch to use TF32 internally', type=bool, metavar='BOOL') @click.option('--workers',default=4, help='Override number of DataLoader workers', type=int, metavar='INT')

could you give me some advice to make the results better?Thanks

opened by tdf1995 0
RuntimeError: No such operator aten::cudnn_convolution_backward_weight
I'm trying to create a new model by myself by using the following command:

python train.py --outdir=training-runs --data=https://data-efficient-gans.mit.edu/datasets/100-shot-obama.zip --gpus=1

And I ran into the Error:

return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Traceback (most recent call last): File "F:\USB Backup\Clemson University\Course\Artificial Intellengent\Project\data-efficient-gans-master\data-efficient-gans-master\DiffAugment-stylegan2-pytorch\train.py", line 554, in main() # pylint: disable=no-value-for-parameter File "C:\Users\Dinner.conda\envs\pytorch\lib\site-packages\click\core.py", line 1130, in call return self.main(*args, **kwargs) File "C:\Users\Dinner.conda\envs\pytorch\lib\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) File "C:\Users\Dinner.conda\envs\pytorch\lib\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Users\Dinner.conda\envs\pytorch\lib\site-packages\click\core.py", line 760, in invoke return _callback(*args, **kwargs) File "C:\Users\Dinner.conda\envs\pytorch\lib\site-packages\click\decorators.py", line 26, in new_func return f(get_current_context(), *args, **kwargs) File "F:\USB Backup\Clemson University\Course\Artificial Intellengent\Project\data-efficient-gans-master\data-efficient-gans-master\DiffAugment-stylegan2-pytorch\train.py", line 547, in main subprocess_fn(rank=0, args=args, temp_dir=temp_dir) File "F:\USB Backup\Clemson University\Course\Artificial Intellengent\Project\data-efficient-gans-master\data-efficient-gans-master\DiffAugment-stylegan2-pytorch\train.py", line 398, in subprocess_fn training_loop.training_loop(rank=rank, **args) File "F:\USB Backup\Clemson University\Course\Artificial Intellengent\Project\data-efficient-gans-master\data-efficient-gans-master\DiffAugment-stylegan2-pytorch\training\training_loop.py", line 284, in training_loop loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, sync=sync, gain=gain) File "F:\USB Backup\Clemson University\Course\Artificial Intellengent\Project\data-efficient-gans-master\data-efficient-gans-master\DiffAugment-stylegan2-pytorch\training\loss.py", line 79, in accumulate_gradients loss_Gmain.mean().mul(gain).backward() File "C:\Users\Dinner.conda\envs\pytorch\lib\site-packages\torch_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "C:\Users\Dinner.conda\envs\pytorch\lib\site-packages\torch\autograd_init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "C:\Users\Dinner.conda\envs\pytorch\lib\site-packages\torch\autograd\function.py", line 253, in apply return user_fn(self, *args) File "F:\USB Backup\Clemson University\Course\Artificial Intellengent\Project\data-efficient-gans-master\data-efficient-gans-master\DiffAugment-stylegan2-pytorch\torch_utils\ops\conv2d_gradfix.py", line 133, in backward grad_weight = Conv2dGradWeight.apply(grad_output, input) File "F:\USB Backup\Clemson University\Course\Artificial Intellengent\Project\data-efficient-gans-master\data-efficient-gans-master\DiffAugment-stylegan2-pytorch\torch_utils\ops\conv2d_gradfix.py", line 145, in forward op = torch._C._jit_get_operation('aten::cudnn_convolution_backward_weight' if not transpose else 'aten::cudnn_convolution_transpose_backward_weight') RuntimeError: No such operator aten::cudnn_convolution_backward_weight

My Cuda version: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Mar_21_19:24:09_Pacific_Daylight_Time_2021 Cuda compilation tools, release 11.3, V11.3.58 Build cuda_11.3.r11.3/compiler.29745058_0

Pytorch version: 1.11.0

How should I fix it?
opened by fancyshirt 1
How do you calculate accuracy in paper?

For example, in Figure 5 (b), you plot the D's accuracy on T(x), T(G(z)), and G(z), I wonder how this metric was calculated. After each update of discriminator or after each update of discriminator and generator?

opened by LuoXin-s 1
Generate images from a grayscale trained pkl file

Hello, I have trained using a 100 set of grayscale images, now I am stuck in the generate.py because it is all set for generating RGB images. I get this error:

ValueError: not enough image data

or when I changed the generate.py archive from RGB to L :

ValueError: Too many dimensions: 3 > 2.

I have searched the internet for hours and tried a lot of stuff but I am pretty new to python so I have run out of ideas of what I need to change in the generate.py to be able to have the images, can somebody please help me?

opened by velamini 2

Ensembling Off-the-shelf Models for GAN Training

Related tags

Overview

Data-Efficient GANs with DiffAugment

project | paper | datasets | video | slides

Overview

Training and Generation with 100 Images

DiffAugment for StyleGAN2

DiffAugment for BigGAN

Using DiffAugment for Your Own Training

Citation

Acknowledgements

Comments

Test1:

Test2:

Dataset.

Base config.

Discriminator augmentation.

Transfer learning.

Performance options.

Owner

MIT HAN Lab

Invert and perturb GAN images for test-time ensembling

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

Unofficial Alias-Free GAN implementation. Based on rosinality's version with expanded training and inference options.

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

DRLib：A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

E-Ink Magic Calendar that automatically syncs to Google Calendar and runs off a battery powered Raspberry Pi Zero