[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

Last update: Jan 4, 2023

Related tags

Deep Learning stylegan_xl

Overview

[Project] [PDF]

This repository contains code for our SIGGRAPH'22 paper "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets"

by Axel Sauer, Katja Schwarz, and Andreas Geiger.

If you find our code or paper useful, please cite

@InProceedings{Sauer2021ARXIV,
  author    = {Axel Sauer and Katja Schwarz and Andreas Geiger},
  title     = {StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets},
  journal   = {arXiv.org},
  volume    = {abs/2201.00273},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.00273},
}

Rank on Papers With Code

Related Projects

Projected GANs Converge Faster (NeurIPS'21) - Official Repo -
StyleGAN-XL + CLIP (Implemented by CasualGANPapers) -
StyleGAN-XL + CLIP (Modified by Katherine Crowson to optimize in W+ space) -

ToDos

Initial code release
Add pretrained models (ImageNet{16,32,64,128,256,512,1024}, FFHQ{256,512,1024}, Pokemon{256,512,1024})
Add StyleMC for editing
Add PTI for inversion

Requirements

64-bit Python 3.8 and PyTorch 1.9.0 (or later). See https://pytorch.org for PyTorch install instructions.
CUDA toolkit 11.1 or later.
GCC 7 or later compilers. The recommended GCC version depends on your CUDA version; see for example, CUDA 11.4 system requirements.
If you run into problems when setting up the custom CUDA kernels, we refer to the Troubleshooting docs of the original StyleGAN3 repo and the following issues: #23.
Windows user struggling installing the env might find #10 helpful.
Use the following commands with Miniconda3 to create and activate your PG Python environment:
- conda env create -f environment.yml
- conda activate sgxl

Data Preparation

For a quick start, you can download the few-shot datasets provided by the authors of FastGAN. You can download them here. To prepare the dataset at the respective resolution, run

python dataset_tool.py --source=./data/pokemon --dest=./data/pokemon256.zip \
  --resolution=256x256 --transform=center-crop

You need to follow our progressive growing scheme to get the best results. Therefore, you should prepare separate zips for each training resolution. You can get the datasets we used in our paper at their respective websites (FFHQ, ImageNet).

Training

For progressive growing, we train a stem on low resolution, e.g., 16² pixels. When the stem is finished, i.e., FID is saturating, you can start training the upper stages; we refer to these as superresolution stages.

Training the stem

Training StyleGAN-XL on Pokemon using 8 GPUs:

python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=./data/pokemon16.zip \
    --gpus=8 --batch=64 --mirror=1 --snap 10 --batch-gpu 8 --kimg 10000 --syn_layers 10

--batch specifies the overall batch size, --batch-gpu specifies the batch size per GPU. The training loop will automatically accumulate gradients if you use fewer GPUs until the overall batch size is reached.

Samples and metrics are saved in outdir. If you don't want to track metrics, set --metrics=none. You can inspect fid50k_full.json or run tensorboard in training-runs/ to monitor the training progress.

For a class-conditional dataset (ImageNet, CIFAR-10), add the flag --cond True . The dataset needs to contain the class labels; see the StyleGAN2-ADA repo on how to prepare class-conditional datasets.

Training the super-resolution stages

Continuing with pretrained stem:

python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=./data/pokemon32.zip \
  --gpus=8 --batch=64 --mirror=1 --snap 10 --batch-gpu 8 --kimg 10000 --syn_layers 10 \
  --superres --up_factor 2 --head_layers 7 \
  --path_stem training-runs/pokemon/00000-stylegan3-t-pokemon16-gpus8-batch64/best_model.pkl

--up_factor allows to train several stages at once, i.e., with --up_factor=4 and a 16² stem you can directly train at resolution 64².

If you have enough compute, a good tactic is to train several stages in parallel and then restart the superresolution stage training once in a while. The current stage will then reload its previous stem's best_model.pkl. Performance can sometimes drop at first because of domain shift, but the superresolution stage quickly recovers and improves further.

Training recommendations for datasets other than ImageNet

The default settings are tuned for ImageNet. For smaller datasets (<50k images) or well-curated datasets (FFHQ), you can significantly decrease the model size enabling much faster training. Recommended settings are: --cbase 128 --cmax 128 --syn_layers 4 and for superresolution stages --head_layers 4.

Suppose you want to train as few stages as possible. We recommend training a 32x32 or 64x64 stem, then directly scaling to the final resolution (as described above, you must adjust --up_factor accordingly). However, generally, progressive growing yields better results faster as the throughput is much higher at lower resolutions. This can be seen in this figure by Karras et al., 2017:

Generating Samples & Interpolations

To generate samples and interpolation videos, run

python gen_images.py --outdir=out --trunc=0.7 --seeds=10-15 --batch-sz 1 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon256.pkl

and

python gen_video.py --output=lerp.mp4 --trunc=0.7 --seeds=0-31 --grid=4x2 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon256.pkl

For class-conditional models, you can pass the class index via --class, a index-to-label dictionary for Imagenet can be found here. For interpolation between classes, provide, e.g., --cls=0-31 to gen_video.py. The list of classes has to be the same length as --seeds.

To generate a conditional sample sheet, run

python gen_class_samplesheet.py --outdir=sample_sheets --trunc=1.0 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet128.pkl \
  --samples-per-class 4 --classes 0-32 --grid-width 32

For ImageNet models, we enable multi-modal truncation (proposed by Self-Distilled GAN). We generated 600k find 10k cluster centroids via k-means. For a given samples, multi-modal truncation finds the closest centroids and interpolates towards it. To switch from uni-model to multi-modal truncation, pass

_{--centroids-path=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet_centroids.npy}

No Truncation	Uni-Modal Truncation	Multi-Modal Truncation

Image Editing

To use our reimplementation of StyleMC, and generate the example above, run

python run_stylemc.py --outdir=stylemc_out \
  --text-prompt "a chimpanzee | laughter | happyness| happy chimpanzee | happy monkey | smile | grin" \
  --seeds 0-256 --class-idx 367 --layers 10-30 --edit-strength 0.75 --init-seed 49 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet128.pkl \
  --bigger-network https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet1024.pkl

Recommended workflow:

Sample images via gen_images.py.
Pick a sample and use it as the inital image for stylemc.py by providing --init-seed and --class-idx.
Find a direction in style space via --text-prompt.
Finetune --edit-strength, --layers, and amount of --seeds.
Once you found a good setting, provide a larger model via --bigger-network. The script still optimizes the direction for the smaller model, but uses the bigger model for the final output.

Pretrained Models

We provide the following pretrained models (pass the url as PATH_TO_NETWORK_PKL):

Dataset	Res	FID	PATH
ImageNet	16²	0.73	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet16.pkl}
ImageNet	32²	1.11	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet32.pkl}
ImageNet	64²	1.52	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet64.pkl}
ImageNet	128²	1.77	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet128.pkl}
ImageNet	256²	2.26	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet256.pkl}
ImageNet	512²	2.42	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet512.pkl}
ImageNet	1024²	2.51	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet1024.pkl}
CIFAR10	32²	1.85	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/cifar10.pkl}
FFHQ	256²	2.19	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/ffhq256.pkl}
FFHQ	512²	2.23	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/ffhq512.pkl}
FFHQ	1024²	2.02	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/ffhq1024.pkl}
Pokemon	256²	23.97	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon256.pkl}
Pokemon	512²	23.82	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon512.pkl}
Pokemon	1024²	25.47	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon1024.pkl}

Quality Metrics

Per default, train.py tracks FID50k during training. To calculate metrics for a specific network snapshot, run

python calc_metrics.py --metrics=fid50k_full --network=PATH_TO_NETWORK_PKL

To see the available metrics, run

python calc_metrics.py --help

We provide precomputed FID statistics for all pretrained models:

wget https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/gan-metrics.zip
unzip gan-metrics.zip -d dnnlib/

Further Information

This repo builds on the codebase of StyleGAN3 and our previous project Projected GANs Converge Faster.

Comments

Error Running Demo

After following the installation instructions, I get the following error running Cuda 11.6 on an RTX 2080ti

Traceback (most recent call last):
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 332, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 317, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "/home/alex/Spring-2022/CV/DogeGAN/resources/stylegan_xl/train.py", line 104, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 49, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/training/training_loop.py", line 339, in training_loop
    loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/training/loss.py", line 121, in accumulate_gradients
    loss_Gmain.backward()
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/torch_utils/ops/conv2d_gradfix.py", line 144, in backward
    grad_weight = Conv2dGradWeight.apply(grad_output, input)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/torch_utils/ops/conv2d_gradfix.py", line 173, in forward
    return torch._C._jit_get_operation(name)(weight_shape, grad_output, input, padding, stride, dilation, groups, *flags)
RuntimeError: No such operator aten::cudnn_convolution_transpose_backward_weight

opened by AlexKashi 19

AttributeError: 'VisionTransformer' object has no attribute 'forward_flex'

When I train super-resolution stages, a message was thrown out: Traceback (most recent call last): File "train.py", line 338, in main() # pylint: disable=no-value-for-parameter File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1137, in call return self.main(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1062, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 763, in invoke return __callback(*args, **kwargs) File "train.py", line 323, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "train.py", line 106, in launch_training torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus) File "/home/ubuntu/MyFiles/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/ubuntu/MyFiles/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/home/ubuntu/MyFiles/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/ubuntu/MyFiles/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in wrap fn(i, *args) File "/home/ubuntu/user_space/stylegan_xl/train.py", line 49, in subprocess_fn training_loop.training_loop(rank=rank, **c) File "/home/ubuntu/user_space/stylegan_xl/training/training_loop.py", line 170, in training_loop G = dnnlib.util.construct_class_by_name(**G_kwargs, **common_kwargs).train().requires_grad(False).to(device) # subclass of torch.nn.Module File "/home/ubuntu/user_space/stylegan_xl/dnnlib/util.py", line 303, in construct_class_by_name return call_func_by_name(*args, func_name=class_name, **kwargs) File "/home/ubuntu/user_space/stylegan_xl/dnnlib/util.py", line 298, in call_func_by_name return func_obj(*args, **kwargs) File "/home/ubuntu/user_space/stylegan_xl/torch_utils/persistence.py", line 104, in init super().init(*args, **kwargs) File "/home/ubuntu/user_space/stylegan_xl/training/networks_stylegan3_resetting.py", line 612, in init G_stem = legacy.load_network_pkl(f)['G_ema'] File "/home/ubuntu/user_space/stylegan_xl/legacy.py", line 25, in load_network_pkl data = _LegacyUnpickler(f).load() File "/home/ubuntu/MyFiles/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1177, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'VisionTransformer' object has no attribute 'forward_flex'

I use stylegan2, torch ==1.10.1.

opened by leinace1001 10
Hello, may I ask you how much data you used in your Pokemon image generation project

Hello, may I apply to you for sharing the data set of POKEMON? I've been working on a Pokemon image generation project recently, but the quality of the generated Pokemon is so poor (FID93) that they are like Cthulhu growling

opened by jexz11 8
how to prepare imagenet dataset

Hello, your project mentions the use of imagenet dataset, but I have some problems in reproducing it, because there is no usage method for imagenet dataset in datatool. Also, I would like to know how to effectively train the sota results you mentioned in paperswithcode, do you have your training plan? For example, what settings are used for training 16, 32, 64 stages, which are usually written in the yaml file, and how much time it takes to train Imagenet with these settings, if v100/day is used It may be a bit long.

opened by yuxuany1 7
code and pretrained models

Dear StyleGAN_xl team,

Thank you for your great work. The results are amazing.

Do you have plan to release the code and pretrained models? When you will release them?

Thank you for your help.

Best Wishes,

Zongze

opened by betterze 7

AttributeError

When I try to generated videos I get the following error:

    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DummyMapping' object has no attribute 'w_avg

I appreciate your help!

opened by Limbicnation 6

KeyError: FullyConnectedLayer

Does anyone know what this error is about? KeyError Traceback (most recent call last) Input In [19], in <cell line: 16>() 14 fetch_model(network_url[Model]) 16 with dnnlib.util.open_url(network_name) as f: ---> 17 G = legacy.load_network_pkl(f)['G_ema'].to(device) # type: ignore 20 zs = torch.randn([10000, G.mapping.z_dim], device=device) 21 cs = torch.zeros([10000, G.mapping.c_dim], device=device)

File /workspace/./stylegan_xl/legacy.py:25, in load_network_pkl(f, force_fp16) 24 def load_network_pkl(f, force_fp16=False): ---> 25 data = _LegacyUnpickler(f).load() 27 # Legacy TensorFlow pickle => convert. 28 if isinstance(data, tuple) and len(data) == 3 and all(isinstance(net, _TFNetworkStub) for net in data):

File /workspace/./stylegan_xl/torch_utils/persistence.py:193, in _reconstruct_persistent_obj(meta) 190 module = _src_to_module(meta.module_src) 192 assert meta.type == 'class' --> 193 orig_class = module.dict[meta.class_name] 194 decorator_class = persistent_class(orig_class) 195 obj = decorator_class.new(decorator_class)

KeyError: 'FullyConnectedLayer'

opened by JHendel-codes 5
a question

when i try to train your great project, something wrong. do you know the reason for RuntimeError: No such operator aten::cudnn_convolution_backward_weight??? thank you very much~

opened by Youzebin 5
Resuming training produces identical images/no progress

After resuming from a stem network, the fakes will not change. The training continues and snapshots get produced, but they are visually identical to previous ones. (e.g., every snapshot will mimic fakes_init)

opened by kisenera 4

TypeError: 'NoneType' object is not subscriptable in Discriminator

I am trying to use the discriminator in imagenet512.pkl This is how I load it:

network_pkl = config['network_pkl']
print(network_pkl)
with dnnlib.util.open_url(network_pkl) as f:
    module = legacy.load_network_pkl(f)
    G = module['G_ema']
    discriminator = module['D']
    G = G.eval().requires_grad_(False).to(device)

then:

class_indices = torch.full((real_img.shape[0],), config['image_class_idx']).cuda()
c = F.one_hot(class_indices, G.c_dim)
recon_pred = discriminator(recon_img, c)

However, I got

File "/apdcephfs/private_v_boooowang/liif/pg_modules/discriminator.py", line 203, in forward
    x_n = Normalize(feat.normstats['mean'], feat.normstats['std'])(x)
TypeError: 'NoneType' object is not subscriptable

What is the problem? Please help me with it. Thank you so much.

opened by Booooooooooo 4

Question about normalization

Hi, I notice that the normalization of input images is different from traditional normalization way ( x = (x-mean)/std. ):

def norm_with_stats(x, stats):
    x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.5 / stats['mean'][0]) + (0.5 - stats['std'][0]) / stats['mean'][0]
    x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.5 / stats['mean'][1]) + (0.5 - stats['std'][1]) / stats['mean'][1]
    x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.5 / stats['mean'][2]) + (0.5 - stats['std'][2]) / stats['mean'][2]
    x = torch.cat((x_ch0, x_ch1, x_ch2), 1)
    return x

Is there any paper or previous work introduces this kind of normalization method?

opened by zqh0253 4

What is the point for keeping the same projection type?

As the code shows, no matter the resolution is, the project_type is always 2. So what is the point of this line of code here?

https://github.com/autonomousvision/stylegan_xl/blob/4241ff9cfeb69d617427107a75d69e9d1c2d92f2/train.py#L285

opened by anArkitek 0
Error when training stylegan2

When I use this command to train stylegan2:python3 train.py --outdir=./out --cfg=stylegan2 --cond=True --data=./dataset/cifar10.zip --gpus=4 --batch=128 --mirror=True --batch-gpu 32 --cbase 16734 --cmax 256 --kimg 25000 --snap 200 --syn_layers 7 --head_layers 4 It takes errors: TypeError: init() got an unexpected keyword argument 'num_layers'

But I can successfully train stylegan3-r and fastgan.

It is so weird. How to solve these problem/ Has anyone successfully run these two models?

opened by mingqiJ 0
Unbalanced GPU memory usage

Hi! I found that GPU memory consumption is highly unbalanced between GPU0 and the rest of GPUs. Here's the command I used to train on imagenet with resolution 128.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py
--outdir=/storage/guangrun/qijia_3d_model/stylegan-xl/finetune128/
--cfg=stylegan3-t
--data=/datasets/guangrun/qijia_3d_model/imagenet/stylegan_xl/imagenet_sub_seg128.zip
--gpus=8
--batch=32
--mirror=1
--snap 10
--batch-gpu 4
--kimg 10000
--cond True
--superres
--up_factor 2
--head_layers 7
--path_stem /scratch/local/ssd/guangrun/qijia_3d_model/stylegan_xl/imagenet64.pkl
--resume /scratch/local/ssd/guangrun/qijia_3d_model/stylegan_xl/imagenet128.pkl

As you can see, the GPU0 only consumes much less memory than rest of the GPUs. May I ask what caused such imbalance and what's the normal memory consumption is when training at 128 resolution with the settings above?

opened by Michaelsqj 1
Finetune on the pretrained model

Hi, thank you so much for sharing this great work. I'm thinking of doing some finetune one the 128^2 imagenet pretrained pkl model. However, I'm not sure how could I do that. For example, what syn_layers, head_layers,... should be and what arguments should I pass in to the train.py? Is it possible for you to provide an example on that?

opened by Michaelsqj 0
[Request] A method to resume training with different batch size while keeping your G epoch and nkimg value.

on SG2 and SG3 given you use them on a modified fork you can resume training on a completely different batch size and still keep your Tick / Nkimg progress , by specifying it with the kwarg ""--nkimg"" example --nkimg=2500 resumes training with an assumed 2500kimg progress.

SGXL resets kimg to 0 if you change the batch size

I found its extremely useful to start with a very low batch size such as --batch=2 --glr=0.0008 --dlr=0.0006 to improve diversity and then swittch to a batch size of 32 / 64 / 128 for better FID when im starting to bottom out FID with batch=2 ,

however because the Augmentation , the G epoch and kimg resets in SGXL when doing this , I am having a really bad time.

opened by nom57 3

Owner

GitHub

Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

HaloNet - Pytorch Implementation of the Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones. This re

189 Nov 22, 2022

Image-Scaling Attacks and Defenses

Image-Scaling Attacks & Defenses This repository belongs to our publication: Erwin Quiring, David Klein, Daniel Arp, Martin Johns and Konrad Rieck. Ad

163 Nov 21, 2022

A PyTorch implementation of " EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks."

EfficientNet A PyTorch implementation of EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. [arxiv] [Official TF Repo] Implemen

298 Dec 10, 2022

Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

41 Nov 29, 2022

For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

ImgAlign For auto aligning, cropping, and scaling HR and LR images for training image based neural networks Usage Make sure OpenCV is installed, 'pip

15 Dec 4, 2022

Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling This repo contains the official implementation for the paper On Path Int

39 Nov 10, 2022

Implementation of "Scaled-YOLOv4: Scaling Cross Stage Partial Network" using PyTorch framwork.

YOLOv4-large This is the implementation of "Scaled-YOLOv4: Scaling Cross Stage Partial Network" using PyTorch framwork. YOLOv4-CSP YOLOv4-tiny YOLOv4-

2k Jan 2, 2023

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution [arXiv 2021].

122 Dec 12, 2022

As-ViT: Auto-scaling Vision Transformers without Training

As-ViT: Auto-scaling Vision Transformers without Training [PDF] Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou In ICLR 2

68 Sep 5, 2022

Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets This is the official implementation of "Towards Good Pract

52 Nov 22, 2022

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm

86 Dec 7, 2022