Ensembling Off-the-shelf Models for GAN Training

Last update: Dec 28, 2022

Related tags

Deep Learning computer-vision computer-graphics pytorch generative-adversarial-network image-generation pretrained-models gans

Overview

Vision-aided GAN

video (3m) | website | paper

Can the collective knowledge from a large bank of pretrained vision models be leveraged to improve GAN training? If so, with so many models to choose from, which one(s) should be selected, and in what manner are they most effective?

We find that pretrained computer vision models can significantly improve performance when used in an ensemble of discriminators. We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings, choosing the most accurate model, and progressively adding it to the discriminator ensemble. Our method can improve GAN training in both limited data and large-scale settings.

Ensembling Off-the-shelf Models for GAN Training
Nupur Kumari, Richard Zhang, Eli Shechtman, Jun-Yan Zhu
arXiv 2112.09130, 2021

Quantitative Comparison

Our method outperforms recent GAN training methods by a large margin, especially in limited sample setting. For LSUN Cat, we achieve similar FID as StyleGAN2 trained on the full dataset using only $0.7%$ of the dataset. On the full dataset, our method improves FID by 1.5x to 2x on cat, church, and horse categories of LSUN.

Example Results

Below, we show visual comparisons between the baseline StyleGAN2-ADA and our model (Vision-aided GAN) for the same randomly sample latent code.

Interpolation Videos

Latent interpolation results of models trained with our method on AnimalFace Cat (160 images), Dog (389 images), and Bridge-of-Sighs (100 photos).

Requirements

64-bit Python 3.8 and PyTorch 1.8.0 (or later). See https://pytorch.org/ for PyTorch install instructions.
Cuda toolkit 11.0 or later.
python libraries: see requirements.txt
StyleGAN2 code relies heavily on custom PyTorch extensions. For detail please refer to the repo stylegan2-ada-pytorch

Setting up Off-the-shelf Computer Vision models

CLIP(ViT): we modify the model.py function to return intermediate features of the transformer model. To set up follow these steps.

git clone https://github.com/openai/CLIP.git
cp vision-aided-gan/training/clip_model.py CLIP/clip/model.py
cd CLIP
python setup.py install

DINO(ViT): model is automatically downloaded from torch hub.

VGG-16: model is automatically downloaded.

Swin-T(MoBY): Create a pretrained-models directory and save the downloaded model there.

Swin-T(Object Detection): follow the below step for setup. Download the model here and save it in the pretrained-models directory.

git clone https://github.com/SwinTransformer/Swin-Transformer-Object-Detection
cd Swin-Transformer-Object-Detection
pip install mmcv-full==1.3.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html
python setup.py install

for more details on mmcv installation please refer here

Swin-T(Segmentation): follow the below step for setup. Download the model here and save it in the pretrained-models directory.

git clone https://github.com/SwinTransformer/Swin-Transformer-Semantic-Segmentation.git
cd Swin-Transformer-Semantic-Segmentation
python setup.py install

Face Parsing:download the model here and save in the pretrained-models directory.

Face Normals:download the model here and save in the pretrained-models directory.

Pretrained Models

Our final trained models can be downloaded at this link

To generate images:

python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 --network=<network.pkl>

The output is stored in out directory controlled by --outdir. Our generator architecture is same as styleGAN2 and can be similarly used in the Python code as described in stylegan2-ada-pytorch.

model evaluation:

python calc_metrics.py --network <network.pkl> --metrics fid50k_full --data <dataset> --clean 1

We use clean-fid library to calculate FID metric. For LSUN Church and LSUN Horse, we calclate the full real distribution statistics. For details on calculating the real distribution statistics, please refer to clean-fid. For default FID evaluation of StyleGAN2-ADA use clean=0.

Datasets

Dataset preparation is same as given in stylegan2-ada-pytorch. Example setup for LSUN Church

LSUN Church

git clone https://github.com/fyu/lsun.git
cd lsun
python3 download.py -c church_outdoor
unzip church_outdoor_train_lmdb.zip
cd ../vision-aided-gan
python dataset_tool.py --source <path-to>/church_outdoor_train_lmdb/ --dest <path-to-datasets>/church1k.zip --max-images 1000  --transform=center-crop --width=256 --height=256

datasets can be downloaded from their repsective websites:

FFHQ, LSUN Categories, AFHQ, AnimalFace Dog, AnimalFace Cat, 100-shot Bridge-of-Sighs

Training new networks

model selection: returns the computer vision model with highest linear probe accuracy for the best FID model in a folder or the given network file.

python model_selection.py --data mydataset.zip --network  <mynetworkfolder or mynetworkpklfile>

example training command for training with a single pretrained network from scratch

python train.py --outdir=training-models/ --data=mydataset.zip --gpus 2 --metrics fid50k_full --kimg 25000 --cfg paper256 --cv input-dino-output-conv_multi_level --cv-loss multilevel_s --augcv ada --ada-target-cv 0.3 --augpipecv bgc --batch 16 --mirror 1 --aug ada --augpipe bgc --snap 25 --warmup 1

Training configuration corresponding to training with vision-aided-loss:

--cv=input-dino-output-conv_multi_level pretrained network and its configuration.
--warmup=0 should be enabled when training from scratch. Introduces our loss after training with 500k images.
--cv-loss=multilevel what loss to use on pretrained model based discriminator.
--augcv=ada performs ADA augmentation on pretrained model based discriminator.
--augcv=diffaugment-<policy> performs DiffAugment on pretrained model based discriminator with given poilcy.
--augpipecv=bgc ADA augmentation strategy. Note: cutout is always enabled.
--ada-target-cv=0.3 adjusts ADA target value for pretrained model based discriminator.
--exact-resume=0 enables exact resume along with optimizer state.

Miscellaneous configurations:

--appendname='' additional string to append to training directory name.
--wandb-log=0 enables wandb logging.
--clean=0 enables FID calculation using clean-fid if the real distribution statistics are pre-calculated.

Run python train.py --help for more details and the full list of args.

References

@article{kumari2021ensembling,
  title={Ensembling Off-the-shelf Models for GAN Training},
  author={Kumari, Nupur and Zhang, Richard and Shechtman, Eli and Zhu, Jun-Yan},
  journal={arXiv preprint arXiv:2112.09130},
  year={2021}
}

Acknowledgments

We thank Muyang Li, Sheng-Yu Wang, Chonghyuk (Andrew) Song for proofreading the draft. We are also grateful to Alexei A. Efros, Sheng-Yu Wang, Taesung Park, and William Peebles for helpful comments and discussion. Our codebase is built on stylegan2-ada-pytorch and DiffAugment.

Comments

Regarding R1 Regularization
Thanks for sharing the fantastic work,

I want to use the vision-aided loss for few-shot adaptation, should I use only cvD or combine it with the original discriminator (net_D)? aren't both combined may overfit (more params than # of training data)?

how to perform R1 regularization on vision-aided discriminator (cvD) using the styleGAN2 setting? In your code, the R1 reg was performed on the original net_D.
opened by israrbacha 5
An error occurred while resuming training using“ --resume”

Thank you for your excellent work. When I wanted to resume training after a break, I encountered the following problems: Constructing networks... Setting up augmentation... Resuming from "/content/drive/MyDrive/vision-aided-gan/training-runs/00000-pv-lianpu7k-paper256_2fmap-kimg5000-batch8-ada-input-clip-output-conv_multi_level-cv_loss_multilevel_sigmoid_s-augcv_ada/network-snapshot-000300.pkl" Traceback (most recent call last): File "train.py", line 658, in <module> main(args.parse_args()) File "train.py", line 650, in main subprocess_fn(rank=0, args=args, temp_dir=temp_dir) File "train.py", line 467, in subprocess_fn training_loop.training_loop(rank=rank, **args) File "/content/drive/MyDrive/vision-aided-gan/training/training_loop.py", line 202, in training_loop resume_data = legacy.load_network_pkl(f) File "/content/drive/MyDrive/vision-aided-gan/legacy.py", line 23, in load_network_pkl data = CPU_Unpickler(f).load() File "/content/drive/MyDrive/vision-aided-gan/legacy.py", line 88, in find_class return super().find_class(module, name) AttributeError: Can't get attribute '_LinearWithBias' on <module 'torch.nn.modules.linear' from '/usr/local/lib/python3.8/site-packages/torch/nn/modules/linear.py'>

I sincerely hope to get your help

opened by 49xxy 2
urllib.error.URLError:

Hello,your work is great ! *^_^*But,When I was running this code, this error occurred. code: python train.py --outdir models/ --data datasets/AnimalFace-dog.zip --kimg 4000 --cfg stylegan3-t --gpus 2 --gamma 10 \ --batch 16 --cv input-clip-output-conv_multi_level --cv-loss multilevel_sigmoid_s --mirror 1 --aug ada --warmup 5e5 error: Constructing networks... Traceback (most recent call last): File "/root/.local/conda/envs/stylegan3/lib/python3.9/urllib/request.py", line 1346, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "/root/.local/conda/envs/stylegan3/lib/python3.9/http/client.py", line 1285, in request self._send_request(method, url, body, headers, encode_chunked) File "/root/.local/conda/envs/stylegan3/lib/python3.9/http/client.py", line 1331, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/root/.local/conda/envs/stylegan3/lib/python3.9/http/client.py", line 1280, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/root/.local/conda/envs/stylegan3/lib/python3.9/http/client.py", line 1040, in _send_output self.send(msg) File "/root/.local/conda/envs/stylegan3/lib/python3.9/http/client.py", line 980, in send self.connect() File "/root/.local/conda/envs/stylegan3/lib/python3.9/http/client.py", line 1447, in connect super().connect() File "/root/.local/conda/envs/stylegan3/lib/python3.9/http/client.py", line 956, in connect self._tunnel() File "/root/.local/conda/envs/stylegan3/lib/python3.9/http/client.py", line 930, in _tunnel raise OSError(f"Tunnel connection failed: {code} {message.strip()}") OSError: Tunnel connection failed: 403 Forbidden

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/GANs/vision-aided-gan-main/./stylegan3/train.py", line 326, in main() # pylint: disable=no-value-for-parameter File "/root/.local/conda/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1128, in call return self.main(*args, **kwargs) File "/root/.local/conda/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/root/.local/conda/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/root/.local/conda/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 754, in invoke return callback(*args, **kwargs) File "/root/GANs/vision-aided-gan-main/./stylegan3/train.py", line 321, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "/root/GANs/vision-aided-gan-main/./stylegan3/train.py", line 98, in launch_training subprocess_fn(rank=0, c=c, temp_dir=temp_dir) File "/root/GANs/vision-aided-gan-main/./stylegan3/train.py", line 48, in subprocess_fn training_loop.training_loop(rank=rank, **c) File "/root/GANs/vision-aided-gan-main/stylegan3/training/training_loop.py", line 167, in training_loop cvD = dnnlib.util.construct_class_by_name(**cvD_kwargs, device=device).train().requires_grad(False).to(device) File "/root/GANs/vision-aided-gan-main/stylegan3/dnnlib/util.py", line 303, in construct_class_by_name return call_func_by_name(*args, func_name=class_name, **kwargs) File "/root/GANs/vision-aided-gan-main/stylegan3/dnnlib/util.py", line 298, in call_func_by_name return func_obj(*args, **kwargs) File "/root/.local/lib/python3.9/site-packages/vision_aided_loss/cv_discriminator.py", line 98, in init self.cv_ensemble = CVBackbone(cv_type, output_type, diffaug=diffaug, device=device) File "/root/.local/lib/python3.9/site-packages/vision_aided_loss/cvmodel.py", line 207, in init model = model(cv_type=cv_type).requires_grad_(False).to(device) File "/root/.local/lib/python3.9/site-packages/vision_aided_loss/cvmodel.py", line 97, in init self.model, _ = clip.load("ViT-B/32", jit=False, device='cpu') File "/root/.local/lib/python3.9/site-packages/vision_aided_loss/CLIP/clip/clip.py", line 118, in load model_path = _download(_MODELS[name], download_root or os.path.expanduser("~/.cache/clip")) File "/root/.local/lib/python3.9/site-packages/vision_aided_loss/CLIP/clip/clip.py", line 57, in _download with urllib.request.urlopen(url) as source, open(download_target, "wb") as output: File "/root/.local/conda/envs/stylegan3/lib/python3.9/urllib/request.py", line 214, in urlopen return opener.open(url, data, timeout) File "/root/.local/conda/envs/stylegan3/lib/python3.9/urllib/request.py", line 517, in open response = self._open(req, data) File "/root/.local/conda/envs/stylegan3/lib/python3.9/urllib/request.py", line 534, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/root/.local/conda/envs/stylegan3/lib/python3.9/urllib/request.py", line 494, in _call_chain result = func(*args) File "/root/.local/conda/envs/stylegan3/lib/python3.9/urllib/request.py", line 1389, in https_open return self.do_open(http.client.HTTPSConnection, req, File "/root/.local/conda/envs/stylegan3/lib/python3.9/urllib/request.py", line 1349, in do_open raise URLError(err) urllib.error.URLError: <urlopen error Tunnel connection failed: 403 Forbidden>

How to solve this problem？？？>_<

opened by swing148 1

Types of input/output in each discriminator

Hi,

Thank you for sharing great works!! I would like to use the pretrained discriminator with my scratch discriminator for improving my model. I added the discriminator of vision-aided-gan with cv_type is swin, vgg or clip. (My code is structured similar to edge-connect).

self.discr = vision_aided_loss.Discriminator(cv_type='swin', loss_type='sigmoid', device=config.DEVICE).to(config.DEVICE) self.discr.cv_ensemble.requires_grad_(False)

When I input the generated images (BCH*W) and ground truth images into the discriminator, I got the following lossD from vgg and swin.

tensor([[1.3401], [1.3370], [1.2983], [1.2942], [1.1943], [1.3307], [1.2072], [1.2092]], device='cuda:0', grad_fn=<AddBackward0>)

I could back propagate it by taking the average.

dis_loss = dis_real_loss + dis_fake_loss + torch.mean(lossD)

But, I got the following error from clip.

Traceback (most recent call last): File "/home/naoki/MyProject/src/models.py", line 434, in process lossD = self.discr(dis_input_real, for_real=True) + self.discr(dis_input_fake, for_real=False) File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/vision_aided_loss/cv_discriminator.py", line 187, in forward return self.loss_type(pred_mask, **kwargs) File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/vision_aided_loss/cv_losses.py", line 104, in forward loss_ = self.losses[i](input[i], **kwargs) File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/naoki/.pyenv/versions/3.8.6/lib/python3.8/site-packages/vision_aided_loss/cv_losses.py", line 21, in forward target_ = target.expand_as(input).to(input.device) TypeError: expand_as(): argument 'other' (position 1) must be Tensor, not lis

I am not familiar with these pretrained models. What are the input and output types for each discriminator?

Thank you in advance.

opened by naoki7090624 1

error while converting .pkl to .pt

showing vision_module not found while unpickeling the weights

Loading "../../Generator/vision-aided-gan/stylegan2/models/00047-align-mirror-stylegan2-kimg10000-ada-input-clip-output-conv_multi_level-cv_loss_multilevel_sigmoid_s-augcv_ada-resumecustom/network-snapshot-best.pkl"...
Traceback (most recent call last):
  File "legacy.py", line 382, in <module>
    convert_network_pickle() # pylint: disable=no-value-for-parameter
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "legacy.py", line 373, in convert_network_pickle
    data = load_network_pkl(f, force_fp16=force_fp16)
  File "legacy.py", line 23, in load_network_pkl
    data = _LegacyUnpickler(f).load()
  File "legacy.py", line 92, in find_class
    return super().find_class(module, name)
ModuleNotFoundError: No module named 'vision_module'

opened by gajeshladhar 1

TypeError: forward() got an unexpected keyword argument 'return_intermediate'

i got this when run train.py. class CLIP(torch.nn.Module): ''' if 'conv_multi_level' in self.cv_type: # image_features = self.model(image.type(self.model.conv1.weight.dtype)) image_features = self.model(image.type(self.model.conv1.weight.dtype), return_intermediate=True)//this line

opened by jweihe 1
Training on 7k dataset encounters mode collapse and generator leakage

Hi! Sorry to contact you frequently recently! I'm very interested in your work! I encountered some problems in the reproduction process. When I used my own 7k datasets to train with the following commands: python train.py --outdir=training-runs --data=datasets/face7k.zip --aug=ada --warmup=5e5 --cfg=paper256_2fmap --gpus=2 --kimg=5000 --batch=16 --snap=25 --cv-loss=multilevel_sigmoid_s --augcv=ada --cv=input-clip-output-conv_multi_level --metrics=none

The probability of the adaptive discriminator enhancement increased rapidly during the training process. At present, the quality of the generated samples is very poor. Compared with stylegan2 ada, whether the problem of mode collapse and generator leakage is very serious. I don't know what details I missed. I hope you can help me! Thank you again for answering my questions before!

opened by 49xxy 0
conditional generation (num_classes missing)

Hi! I'm trying to use class conditional generation on stylegan2-ADA, but couldn't find the num_classes option for the vision aided discriminator https://github.com/nupurkmr9/vision-aided-gan/blob/95fc55beefad3e868783beab421108c5baf583aa/stylegan2/vision_module/cv_discriminator.py#L55

opened by Michaelsqj 1
Training on 7k dataset encounters mode collapse and generator leakage

Hi! Sorry to contact you frequently recently! I'm very interested in your work! I encountered some problems in the reproduction process. When I used my own 7k datasets to train with the following commands: python train.py --outdir=training-runs --data=datasets/face7k.zip --aug=ada --warmup=5e5 --cfg=paper256_2fmap --gpus=2 --kimg=5000 --batch=16 --snap=25 --cv-loss=multilevel_sigmoid_s --augcv=ada --cv=input-clip-output-conv_multi_level --metrics=none

The probability of the adaptive discriminator enhancement increased rapidly during the training process. At present, the quality of the generated samples is very poor. Compared with stylegan2 ada, whether the problem of mode collapse and generator leakage is very serious. I don't know what details I missed. I hope you can help me! Thank you again for answering my questions before!

opened by 49xxy 5

Ensembling Off-the-shelf Models for GAN Training

Related tags

Overview

Vision-aided GAN

video (3m) | website | paper

Quantitative Comparison

Example Results

Interpolation Videos

Requirements

Setting up Off-the-shelf Computer Vision models

Pretrained Models

Datasets

Training new networks

References

Acknowledgments

Comments

Owner

Invert and perturb GAN images for test-time ensembling

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

Unofficial Alias-Free GAN implementation. Based on rosinality's version with expanded training and inference options.

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

DRLib：A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

E-Ink Magic Calendar that automatically syncs to Google Calendar and runs off a battery powered Raspberry Pi Zero