ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement

Recently, the power of unconditional image synthesis has significantly advanced through the use of Generative Adversarial Networks (GANs). The task of inverting an image into its corresponding latent code of the trained GAN is of utmost importance as it allows for the manipulation of real images, leveraging the rich semantics learned by the network. Recognizing the limitations of current inversion approaches, in this work we present a novel inversion scheme that extends current encoder-based inversion methods by introducing an iterative refinement mechanism. Instead of directly predicting the latent code of a given image using a single pass, the encoder is tasked with predicting a residual with respect to the current estimate of the inverted latent code in a self-correcting manner. Our residual-based encoder, named ReStyle, attains improved accuracy compared to current state-of-the-art encoder-based methods with a negligible increase in inference time. We analyze the behavior of ReStyle to gain valuable insights into its iterative nature. We then evaluate the performance of our residual encoder and analyze its robustness compared to optimization-based inversion and state-of-the-art encoders.

Different from conventional encoder-based inversion techniques, our residual-based ReStyle scheme incorporates an iterative refinement mechanism to progressively converge to an accurate inversion of real images. For each domain, we show the input image on the left followed by intermediate inversion outputs.

Description

Official Implementation of our ReStyle paper for both training and evaluation. ReStyle introduces an iterative refinement mechanism which can be applied over different StyleGAN encoders for solving the StyleGAN inversion task.

Getting Started

Prerequisites

Linux or macOS
NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
Python 3

Installation

Dependencies:
We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment/restyle_env.yaml.

Pretrained Models

In this repository, we provide pretrained ReStyle encoders applied over the pSp and e4e encoders across various domains.

Please download the pretrained models from the following links.

ReStyle + pSp

Path	Description
FFHQ - ReStyle + pSp	ReStyle applied over pSp trained on the FFHQ dataset.
Stanford Cars - ReStyle + pSp	ReStyle applied over pSp trained on the Stanford Cars dataset.
LSUN Church - ReStyle + pSp	ReStyle applied over pSp trained on the LSUN Church dataset.
AFHQ Wild - ReStyle + pSp	ReStyle applied over pSp trained on the AFHQ Wild dataset.

ReStyle + e4e

Path	Description
FFHQ - ReStyle + e4e	Coming Soon!
Stanford Cars - ReStyle + e4e	Coming Soon!
LSUN Church - ReStyle + e4e	Coming Soon!
AFHQ Wild - ReStyle + e4e	Coming Soon!
LSUN Horse - ReStyle + e4e	ReStyle applied over e4e trained on the LSUN Horse dataset.

Auxiliary Models

In addition, we provide various auxiliary models needed for training your own ReStyle models from scratch.
This includes the StyleGAN generators and pre-trained models used for loss computation.

Path	Description
FFHQ StyleGAN	StyleGAN2 model trained on FFHQ with 1024x1024 output resolution.
LSUN Car StyleGAN	StyleGAN2 model trained on LSUN Car with 512x384 output resolution.
LSUN Church StyleGAN	StyleGAN2 model trained on LSUN Church with 256x256 output resolution.
LSUN Horse StyleGAN	StyleGAN2 model trained on LSUN Horse with 256x256 output resolution.
AFHQ Wild StyleGAN	StyleGAN-ADA model trained on AFHQ Wild with 512x512 output resolution.
IR-SE50 Model	Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss and encoder backbone on human facial domain.
ResNet-34 Model	ResNet-34 model trained on ImageNet taken from torchvision for initializing our encoder backbone.
MoCov2 Model	Pretrained ResNet-50 model trained using MOCOv2 for computing MoCo-based loss on non-facial domains. The model is taken from the official implementation.
CurricularFace Backbone	Pretrained CurricularFace model taken from HuangYG123 for use in ID similarity metric computation.
MTCNN	Weights for MTCNN model taken from TreB1eN for use in ID similarity metric computation. (Unpack the tar.gz to extract the 3 model weights.)

Note: all StyleGAN models are converted from the official TensorFlow models to PyTorch using the conversion script from rosinality.

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Training

Preparing your Data

In order to train ReStyle on your own data, you should perform the following steps:

Update configs/paths_config.py with the necessary data paths and model paths for training and inference.

dataset_paths = {
    'train_data': '/path/to/train/data'
    'test_data': '/path/to/test/data',
}

Configure a new dataset under the DATASETS variable defined in configs/data_configs.py. There, you should define the source/target data paths for the train and test sets as well as the transforms to be used for training and inference.

DATASETS = {
	'my_data_encode': {
		'transforms': transforms_config.EncodeTransforms,   # can define a custom transform, if desired
		'train_source_root': dataset_paths['train_data'],
		'train_target_root': dataset_paths['train_data'],
		'test_source_root': dataset_paths['test_data'],
		'test_target_root': dataset_paths['test_data'],
	}
}

To train with your newly defined dataset, simply use the flag --dataset_type my_data_encode.

Preparing your Generator

In this work, we use rosinality's StyleGAN2 implementation. If you wish to use your own generator trained using NVIDIA's implementation there are a few options we recommend:

Using NVIDIA's StyleGAN2 / StyleGAN-ADA TensorFlow implementation.
You can then convert the TensorFlow .pkl checkpoints to the supported format using the conversion script found in rosinality's implementation.
Using NVIDIA's StyleGAN-ADA PyTorch implementation.
You can then convert the PyTorch .pkl checkpoints to the supported format using the conversion script created by Justin Pinkney found in dvschultz's fork.

Once you have the converted .pt files, you should be ready to use them in this repository.

Training ReStyle

The main training scripts can be found in scripts/train_restyle_psp.py and scripts/train_restyle_e4e.py. Each of the two scripts will run ReStyle applied over the corresponding base inversion method.
Intermediate training results are saved to opts.exp_dir. This includes checkpoints, train outputs, and test outputs.
Additionally, if you have tensorboard installed, you can visualize tensorboard logs in opts.exp_dir/logs.

We currently support applying ReStyle on the pSp encoder from Richardson et al. [2020] and the e4e encoder from Tov et al. [2021].

Training ReStyle with the settings used in the paper can be done by running the following commands.

ReStyle applied over pSp:

python scripts/train_restyle_psp.py \
--dataset_type=ffhq_encode \
--encoder_type=BackboneEncoder \
--exp_dir=experiment/restyle_psp_ffhq_encode \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--val_interval=5000 \
--save_interval=10000 \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--w_norm_lambda=0 \
--id_lambda=0.1 \
--input_nc=6 \
--n_iters_per_batch=5 \
--output_size=1024 \
--stylegan_weights=pretrained_models/stylegan2-ffhq-config-f.pt

ReStyle applied over e4e:

python scripts/train_restyle_e4e.py \
--dataset_type ffhq_encode \
--encoder_type ProgressiveBackboneEncoder \
--exp_dir=experiment/restyle_e4e_ffhq_encode \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--delta_norm_lambda 0.0002 \
--id_lambda 0.1 \
--use_w_pool \
--w_discriminator_lambda 0.1 \
--progressive_start 20000 \
--progressive_step_every 2000 \
--input_nc 6 \
--n_iters_per_batch=5 \
--output_size 1024 \
--stylegan_weights=pretrained_models/stylegan2-ffhq-config-f.pt

Additional Notes:

Encoder backbones:
- For the human facial domain (ffhq_encode), we use an IRSE-50 backbone using the flags:
  - --encoder_type=BackboneEncoder for pSp
  - --encoder_type=ProgressiveBackboneEncoder for e4e
- For all other domains, we use a ResNet34 encoder backbone using the flags:
  - --encoder_type=ResNetBackboneEncoder for pSp
  - --encoder_type=ResNetProgressiveBackboneEncoder for e4e
ID/similarity losses:
- For the human facial domain we also use a specialized ID loss which is set using the flag --id_lambda=0.1.
- For all other domains, please set --id_lambda=0 and --moco_lambda=0.5 to use the MoCo-based similarity loss from Tov et al.
  - Note, you cannot set both id_lambda and moco_lambda to be active simultaneously.
You should also adjust the --output_size and --stylegan_weights flags according to your StyleGAN generator.
See options/train_options.py and options/e4e_train_options.py for all training-specific flags.

Inference Notebook

To help visualize the results of ReStyle we provide a Jupyter notebook found in notebooks/inference_playground.ipynb.
The notebook will download the pretrained models and run inference on the images found in notebooks/images or on images of your choosing. It is recommended to run this in Google Colab.

Testing

Inference

You can use scripts/inference_iterative.py to apply a trained model on a set of images:

python scripts/inference_iterative.py \
--exp_dir=/path/to/experiment \
--checkpoint_path=experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--n_iters_per_batch=5

This script will save each step's outputs in a separate sub-directory (e.g., the outputs of step i will be saved in /path/to/experiment/inference_results/i).

Notes:

By default, the images will be saved at their original output resolutions (e.g., 1024x1024 for faces, 512x384 for cars). If you wish to save outputs resized to resolutions of 256x256 (or 256x192 for cars), you can do so by adding the flag --resize_outputs.
This script will also save all the latents as an .npy file in a dictionary format as follows:

{
    "0.jpg": [latent_step_1, latent_step_2, ..., latent_step_N],
    "1.jpg": [latent_step_1, latent_step_2, ..., latent_step_N],
    ...
}

That is, the keys of the dictionary are the image file names and the values are lists of length N containing the output latent of each step where N is the number of inference steps. Each element in the list is of shape (Kx512) where K is the number of style inputs of the generator.

You can use the saved latents to perform latent space manipulations, for example.

Step-by-Step Inference

Visualizing the intermediate outputs. Here, the intermediate outputs are saved from left to right with the input image shown on the right-hand side.

Sometimes, you may wish to save each step's outputs side-by-side instead of in separate sub-folders. This would allow one to easily see the progression in the reconstruction with each step. To save the step-by-step outputs as a single image, you can run the following:

python scripts/inference_iterative_save_coupled.py \
--exp_dir=/path/to/experiment \
--checkpoint_path=experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--n_iters_per_batch=5

Computing Metrics

Given a trained model and generated outputs, we can compute the loss metrics on a given dataset.
These scripts receive the inference output directory and ground truth directory.

Calculating LPIPS loss:

python scripts/calc_losses_on_images.py \
--mode lpips
--output_path=/path/to/experiment/inference_results \
--gt_path=/path/to/test_images \

Calculating L2 loss:

python scripts/calc_losses_on_images.py \
--mode l2
--output_path=/path/to/experiment/inference_results \
--gt_path=/path/to/test_images \

Calculating the identity loss for the human facial domain:

python scripts/calc_id_loss_parallel.py \
--output_path=/path/to/experiment/inference_results \
--gt_path=/path/to/test_images \

These scripts will traverse through each sub-directory of output_path to compute the metrics on each step's output images.

Encoder Bootstrapping

Image toonification results using our proposed encoder bootstrapping technique.

In the paper, we introduce an encoder bootstrapping technique that can be used to solve the image toonification task by pairing an FFHQ-based encoder with a Toon-based encoder.
Below we provide the models used to generate the results in the paper:

Path	Description
FFHQ - ReStyle + pSp	Same FFHQ encoder as linked above.
Toonify - ReStyle + pSp	ReStyle applied over pSp trained for the image toonification task.
Toonify Generator	Toonify generator from Doron Adler and Justin Pinkney converted to Pytorch using rosinality's conversion script.

Note that the ReStyle toonify model is trained using only real images with no paired data. More details regarding the training parameters and settings of the toonify encoder can be found here.

If you wish to run inference using these two models and the bootstrapping technique you may run the following:

python scripts/encoder_bootstrapping_inference.py \
--exp_dir=/path/to/experiment \
--model_1_checkpoint_path=/path/to/restyle_psp_ffhq_encode.pt \
--model_2_checkpoint_path=/path/to/restyle_psp_toonify.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--n_iters_per_batch=1  # one step for each encoder is typically good

Here, we output the per-step outputs side-by-side with the inverted initialization real-image on the left and the original input image on the right.

Repository structure

Path	Description
SAM	Repository root folder
├ configs	Folder containing configs defining model/data paths and data transforms
├ criteria	Folder containing various loss criterias for training
├ datasets	Folder with various dataset objects
├ docs	Folder containing images displayed in the README
├ environment	Folder containing Anaconda environment used in our experiments
├ licenses	Folder containing licenses of the open source projects used in this repository
├ models	Folder containing all the models and training objects
│ ├ e4e_modules	Folder containing the latent discriminator implementation from encoder4editing
│ ├ encoders	Folder containing various architecture implementations including our simplified encoder architectures
│ ├ mtcnn	MTCNN implementation from TreB1eN
│ ├ stylegan2	StyleGAN2 model from rosinality
│ ├ psp.py	Implementation of pSp encoder extended to work with ReStyle
│ └ e4e.py	Implementation of e4e encoder extended to work with ReStyle
├ notebooks	Folder with jupyter notebook containing ReStyle inference playground
├ options	Folder with training and test command-line options
├ scripts	Folder with running scripts for training, inference, and metric computations
├ training	Folder with main training logic and Ranger implementation from lessw2020
├ utils	Folder with various utility functions

Credits

StyleGAN2 model and implementation:
https://github.com/rosinality/stylegan2-pytorch
Copyright (c) 2019 Kim Seonghyeon
License (MIT) https://github.com/rosinality/stylegan2-pytorch/blob/master/LICENSE

IR-SE50 model and implementations:
https://github.com/TreB1eN/InsightFace_Pytorch
Copyright (c) 2018 TreB1eN
License (MIT) https://github.com/TreB1eN/InsightFace_Pytorch/blob/master/LICENSE

Ranger optimizer implementation:
https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
License (Apache License 2.0) https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/blob/master/LICENSE

LPIPS model and implementation:
https://github.com/S-aiueo32/lpips-pytorch
Copyright (c) 2020, Sou Uchida
License (BSD 2-Clause) https://github.com/S-aiueo32/lpips-pytorch/blob/master/LICENSE

pSp model and implementation:
https://github.com/eladrich/pixel2style2pixel
Copyright (c) 2020 Elad Richardson, Yuval Alaluf
License (MIT) https://github.com/eladrich/pixel2style2pixel/blob/master/LICENSE

e4e model and implementation:
https://github.com/omertov/encoder4editing Copyright (c) 2021 omertov
License (MIT) https://github.com/omertov/encoder4editing/blob/main/LICENSE

Please Note: The CUDA files under the StyleGAN2 ops directory are made available under the Nvidia Source Code License-NC

Acknowledgments

This code borrows heavily from pixel2style2pixel and encoder4editing.

Citation

If you use this code for your research, please cite the following works:

@misc{alaluf2021restyle,
      title={ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement}, 
      author={Yuval Alaluf and Or Patashnik and Daniel Cohen-Or},
      year={2021},
      eprint={2104.02699},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@article{tov2021designing,
  title={Designing an Encoder for StyleGAN Image Manipulation},
  author={Tov, Omer and Alaluf, Yuval and Nitzan, Yotam and Patashnik, Or and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2102.02766},
  year={2021}
}
@article{richardson2020encoding,
  title={Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation},
  author={Richardson, Elad and Alaluf, Yuval and Patashnik, Or and Nitzan, Yotam and Azar, Yaniv and Shapiro, Stav and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2008.00951},
  year={2020}
}

Hi, the issue is same as what title says.

I tried to train ReStyle for max_steps=100 and following is the command I used.

python scripts/train_restyle_e4e.py \
--max_steps 100 \
--dataset_type my_ffhq_encode \
--encoder_type ProgressiveBackboneEncoder \
--exp_dir=experiment/restyle_e4e_ffhq_encode \
--save_training_data \
--workers=2 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=2 \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--delta_norm_lambda 0.0002 \
--id_lambda 0.1 \
--use_w_pool \
--w_discriminator_lambda 0.1 \
--progressive_start 20000 \
--progressive_step_every 2000 \
--input_nc 6 \
--n_iters_per_batch=5 \
--output_size 1024 \
--stylegan_weights=pretrained_models/stylegan2-ffhq-config-f.pt

The log is printed upto 100th step at 1920.9 seconds (~32 minutes) then nothing is printed but the training code runs forever and due to Kaggle's 12 hours limit the notebook crashes. Following is the log from Kaggle notebook run.

Could you please guide me through this issue?

Time  | # | Log Message

10.9s | 1 | /opt/conda/lib/python3.7/site-packages/papermill/iorw.py:50: FutureWarning: pyarrow.HadoopFileSystem is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
-- | -- | --
10.9s | 2 | from pyarrow import HadoopFileSystem
11.6s | 3 | Sun Mar 27 09:03:41 2022
11.6s | 4 | +-----------------------------------------------------------------------------+
11.6s | 5 | \| NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.0     \|
11.6s | 6 | \|-------------------------------+----------------------+----------------------+
11.6s | 7 | \| GPU  Name        Persistence-M\| Bus-Id        Disp.A \| Volatile Uncorr. ECC \|
11.6s | 8 | \| Fan  Temp  Perf  Pwr:Usage/Cap\|         Memory-Usage \| GPU-Util  Compute M. \|
11.6s | 9 | \|                               \|                      \|               MIG M. \|
11.6s | 10 | \|===============================+======================+======================\|
11.6s | 11 | \|   0  Tesla P100-PCIE...  Off  \| 00000000:00:04.0 Off \|                    0 \|
11.6s | 12 | \| N/A   33C    P0    26W / 250W \|      0MiB / 16280MiB \|      0%      Default \|
11.6s | 13 | \|                               \|                      \|                  N/A \|
11.6s | 14 | +-------------------------------+----------------------+----------------------+
11.6s | 15 |  
11.6s | 16 | +-----------------------------------------------------------------------------+
11.6s | 17 | \| Processes:                                                                  \|
11.6s | 18 | \|  GPU   GI   CI        PID   Type   Process name                  GPU Memory \|
11.6s | 19 | \|        ID   ID                                                   Usage      \|
11.6s | 20 | \|=============================================================================\|
11.6s | 21 | \|  No running processes found                                                 \|
11.6s | 22 | +-----------------------------------------------------------------------------+
142.4s | 86 | {'batch_size': 8,
142.4s | 87 | 'board_interval': 50,
142.4s | 88 | 'checkpoint_path': None,
142.4s | 89 | 'd_reg_every': 16,
142.4s | 90 | 'dataset_type': 'my_ffhq_encode',
142.4s | 91 | 'delta_norm': 2,
142.4s | 92 | 'delta_norm_lambda': 0.0002,
142.4s | 93 | 'encoder_type': 'ProgressiveBackboneEncoder',
142.4s | 94 | 'exp_dir': 'experiment/restyle_e4e_ffhq_encode',
142.4s | 95 | 'id_lambda': 0.1,
142.4s | 96 | 'image_interval': 100,
142.4s | 97 | 'input_nc': 6,
142.4s | 98 | 'l2_lambda': 1.0,
142.4s | 99 | 'learning_rate': 0.0001,
142.4s | 100 | 'lpips_lambda': 0.8,
142.4s | 101 | 'max_steps': 100,
142.4s | 102 | 'moco_lambda': 0,
142.4s | 103 | 'n_iters_per_batch': 5,
142.4s | 104 | 'optim_name': 'ranger',
142.4s | 105 | 'output_size': 1024,
142.4s | 106 | 'progressive_start': 20000,
142.4s | 107 | 'progressive_step_every': 2000,
142.4s | 108 | 'progressive_steps': [0,
142.4s | 109 | 20000,
142.4s | 110 | 22000,
142.4s | 111 | 24000,
142.4s | 112 | 26000,
142.4s | 113 | 28000,
142.4s | 114 | 30000,
142.4s | 115 | 32000,
142.4s | 116 | 34000,
142.4s | 117 | 36000,
142.4s | 118 | 38000,
142.4s | 119 | 40000,
142.4s | 120 | 42000,
142.4s | 121 | 44000,
142.4s | 122 | 46000,
142.4s | 123 | 48000,
142.4s | 124 | 50000,
142.4s | 125 | 52000],
142.4s | 126 | 'r1': 10,
142.4s | 127 | 'resume_training_from_ckpt': None,
142.4s | 128 | 'save_interval': None,
142.4s | 129 | 'save_training_data': True,
142.4s | 130 | 'start_from_latent_avg': True,
142.4s | 131 | 'stylegan_weights': 'pretrained_models/stylegan2-ffhq-config-f.pt',
142.4s | 132 | 'sub_exp_dir': None,
142.4s | 133 | 'test_batch_size': 8,
142.4s | 134 | 'test_workers': 2,
142.4s | 135 | 'train_decoder': False,
142.4s | 136 | 'update_param_list': None,
142.4s | 137 | 'use_w_pool': True,
142.4s | 138 | 'val_interval': 1000,
142.4s | 139 | 'w_discriminator_lambda': 0.1,
142.4s | 140 | 'w_discriminator_lr': 2e-05,
142.4s | 141 | 'w_norm_lambda': 0,
142.4s | 142 | 'w_pool_size': 50,
142.4s | 143 | 'workers': 2}
144.5s | 144 | Loading encoders weights from irse50!
147.6s | 145 | Loading decoder weights from pretrained path: pretrained_models/stylegan2-ffhq-config-f.pt
148.3s | 146 | +------------------------------------------+------------+
148.3s | 147 | \|                 Modules                  \| Parameters \|
148.3s | 148 | +------------------------------------------+------------+
148.3s | 149 | \|       encoder.input_layer.0.weight       \|    3456    \|
148.3s | 150 | \|       encoder.input_layer.1.weight       \|     64     \|
148.3s | 151 | \|        encoder.input_layer.1.bias        \|     64     \|
148.3s | 152 | \|       encoder.input_layer.2.weight       \|     64     \|
148.3s | 153 | \|    encoder.body.0.res_layer.0.weight     \|     64     \|
148.3s | 154 | \|     encoder.body.0.res_layer.0.bias      \|     64     \|
148.3s | 155 | \|    encoder.body.0.res_layer.1.weight     \|   36864    \|
148.3s | 156 | \|    encoder.body.0.res_layer.2.weight     \|     64     \|
148.3s | 157 | \|    encoder.body.0.res_layer.3.weight     \|   36864    \|
148.3s | 158 | \|    encoder.body.0.res_layer.4.weight     \|     64     \|
148.3s | 159 | \|     encoder.body.0.res_layer.4.bias      \|     64     \|
148.3s | 160 | \|  encoder.body.0.res_layer.5.fc1.weight   \|    256     \|
148.3s | 161 | \|  encoder.body.0.res_layer.5.fc2.weight   \|    256     \|
148.3s | 162 | \|    encoder.body.1.res_layer.0.weight     \|     64     \|
148.3s | 163 | \|     encoder.body.1.res_layer.0.bias      \|     64     \|
148.3s | 164 | \|    encoder.body.1.res_layer.1.weight     \|   36864    \|
148.3s | 165 | \|    encoder.body.1.res_layer.2.weight     \|     64     \|
148.3s | 166 | \|    encoder.body.1.res_layer.3.weight     \|   36864    \|
148.3s | 167 | \|    encoder.body.1.res_layer.4.weight     \|     64     \|
148.3s | 168 | \|     encoder.body.1.res_layer.4.bias      \|     64     \|
148.3s | 169 | \|  encoder.body.1.res_layer.5.fc1.weight   \|    256     \|
148.3s | 170 | \|  encoder.body.1.res_layer.5.fc2.weight   \|    256     \|
148.3s | 171 | \|    encoder.body.2.res_layer.0.weight     \|     64     \|
148.3s | 172 | \|     encoder.body.2.res_layer.0.bias      \|     64     \|
148.3s | 173 | \|    encoder.body.2.res_layer.1.weight     \|   36864    \|
148.3s | 174 | \|    encoder.body.2.res_layer.2.weight     \|     64     \|
148.3s | 175 | \|    encoder.body.2.res_layer.3.weight     \|   36864    \|
148.3s | 176 | \|    encoder.body.2.res_layer.4.weight     \|     64     \|
148.3s | 177 | \|     encoder.body.2.res_layer.4.bias      \|     64     \|
148.3s | 178 | \|  encoder.body.2.res_layer.5.fc1.weight   \|    256     \|
148.3s | 179 | \|  encoder.body.2.res_layer.5.fc2.weight   \|    256     \|
148.3s | 180 | \|  encoder.body.3.shortcut_layer.0.weight  \|    8192    \|
148.3s | 181 | \|  encoder.body.3.shortcut_layer.1.weight  \|    128     \|
148.3s | 182 | \|   encoder.body.3.shortcut_layer.1.bias   \|    128     \|
148.3s | 183 | \|    encoder.body.3.res_layer.0.weight     \|     64     \|
148.3s | 184 | \|     encoder.body.3.res_layer.0.bias      \|     64     \|
148.3s | 185 | \|    encoder.body.3.res_layer.1.weight     \|   73728    \|
148.3s | 186 | \|    encoder.body.3.res_layer.2.weight     \|    128     \|
148.3s | 187 | \|    encoder.body.3.res_layer.3.weight     \|   147456   \|
148.3s | 188 | \|    encoder.body.3.res_layer.4.weight     \|    128     \|
148.3s | 189 | \|     encoder.body.3.res_layer.4.bias      \|    128     \|
148.3s | 190 | \|  encoder.body.3.res_layer.5.fc1.weight   \|    1024    \|
148.3s | 191 | \|  encoder.body.3.res_layer.5.fc2.weight   \|    1024    \|
148.3s | 192 | \|    encoder.body.4.res_layer.0.weight     \|    128     \|
148.3s | 193 | \|     encoder.body.4.res_layer.0.bias      \|    128     \|
148.3s | 194 | \|    encoder.body.4.res_layer.1.weight     \|   147456   \|
148.3s | 195 | \|    encoder.body.4.res_layer.2.weight     \|    128     \|
148.3s | 196 | \|    encoder.body.4.res_layer.3.weight     \|   147456   \|
148.3s | 197 | \|    encoder.body.4.res_layer.4.weight     \|    128     \|
148.3s | 198 | \|     encoder.body.4.res_layer.4.bias      \|    128     \|
148.3s | 199 | \|  encoder.body.4.res_layer.5.fc1.weight   \|    1024    \|
148.3s | 200 | \|  encoder.body.4.res_layer.5.fc2.weight   \|    1024    \|
148.3s | 201 | \|    encoder.body.5.res_layer.0.weight     \|    128     \|
148.3s | 202 | \|     encoder.body.5.res_layer.0.bias      \|    128     \|
148.3s | 203 | \|    encoder.body.5.res_layer.1.weight     \|   147456   \|
148.3s | 204 | \|    encoder.body.5.res_layer.2.weight     \|    128     \|
148.3s | 205 | \|    encoder.body.5.res_layer.3.weight     \|   147456   \|
148.3s | 206 | \|    encoder.body.5.res_layer.4.weight     \|    128     \|
148.3s | 207 | \|     encoder.body.5.res_layer.4.bias      \|    128     \|
148.3s | 208 | \|  encoder.body.5.res_layer.5.fc1.weight   \|    1024    \|
148.3s | 209 | \|  encoder.body.5.res_layer.5.fc2.weight   \|    1024    \|
148.3s | 210 | \|    encoder.body.6.res_layer.0.weight     \|    128     \|
148.3s | 211 | \|     encoder.body.6.res_layer.0.bias      \|    128     \|
148.3s | 212 | \|    encoder.body.6.res_layer.1.weight     \|   147456   \|
148.3s | 213 | \|    encoder.body.6.res_layer.2.weight     \|    128     \|
148.3s | 214 | \|    encoder.body.6.res_layer.3.weight     \|   147456   \|
148.3s | 215 | \|    encoder.body.6.res_layer.4.weight     \|    128     \|
148.3s | 216 | \|     encoder.body.6.res_layer.4.bias      \|    128     \|
148.3s | 217 | \|  encoder.body.6.res_layer.5.fc1.weight   \|    1024    \|
148.3s | 218 | \|  encoder.body.6.res_layer.5.fc2.weight   \|    1024    \|
148.3s | 219 | \|  encoder.body.7.shortcut_layer.0.weight  \|   32768    \|
148.3s | 220 | \|  encoder.body.7.shortcut_layer.1.weight  \|    256     \|
148.3s | 221 | \|   encoder.body.7.shortcut_layer.1.bias   \|    256     \|
148.3s | 222 | \|    encoder.body.7.res_layer.0.weight     \|    128     \|
148.3s | 223 | \|     encoder.body.7.res_layer.0.bias      \|    128     \|
148.3s | 224 | \|    encoder.body.7.res_layer.1.weight     \|   294912   \|
148.3s | 225 | \|    encoder.body.7.res_layer.2.weight     \|    256     \|
148.3s | 226 | \|    encoder.body.7.res_layer.3.weight     \|   589824   \|
148.3s | 227 | \|    encoder.body.7.res_layer.4.weight     \|    256     \|
148.3s | 228 | \|     encoder.body.7.res_layer.4.bias      \|    256     \|
148.3s | 229 | \|  encoder.body.7.res_layer.5.fc1.weight   \|    4096    \|
148.3s | 230 | \|  encoder.body.7.res_layer.5.fc2.weight   \|    4096    \|
148.3s | 231 | \|    encoder.body.8.res_layer.0.weight     \|    256     \|
148.3s | 232 | \|     encoder.body.8.res_layer.0.bias      \|    256     \|
148.3s | 233 | \|    encoder.body.8.res_layer.1.weight     \|   589824   \|
148.3s | 234 | \|    encoder.body.8.res_layer.2.weight     \|    256     \|
148.3s | 235 | \|    encoder.body.8.res_layer.3.weight     \|   589824   \|
148.3s | 236 | \|    encoder.body.8.res_layer.4.weight     \|    256     \|
148.3s | 237 | \|     encoder.body.8.res_layer.4.bias      \|    256     \|
148.3s | 238 | \|  encoder.body.8.res_layer.5.fc1.weight   \|    4096    \|
148.3s | 239 | \|  encoder.body.8.res_layer.5.fc2.weight   \|    4096    \|
148.3s | 240 | \|    encoder.body.9.res_layer.0.weight     \|    256     \|
148.3s | 241 | \|     encoder.body.9.res_layer.0.bias      \|    256     \|
148.3s | 242 | \|    encoder.body.9.res_layer.1.weight     \|   589824   \|
148.3s | 243 | \|    encoder.body.9.res_layer.2.weight     \|    256     \|
148.3s | 244 | \|    encoder.body.9.res_layer.3.weight     \|   589824   \|
148.3s | 245 | \|    encoder.body.9.res_layer.4.weight     \|    256     \|
148.3s | 246 | \|     encoder.body.9.res_layer.4.bias      \|    256     \|
148.3s | 247 | \|  encoder.body.9.res_layer.5.fc1.weight   \|    4096    \|
148.3s | 248 | \|  encoder.body.9.res_layer.5.fc2.weight   \|    4096    \|
148.3s | 249 | \|    encoder.body.10.res_layer.0.weight    \|    256     \|
148.3s | 250 | \|     encoder.body.10.res_layer.0.bias     \|    256     \|
148.3s | 251 | \|    encoder.body.10.res_layer.1.weight    \|   589824   \|
148.3s | 252 | \|    encoder.body.10.res_layer.2.weight    \|    256     \|
148.3s | 253 | \|    encoder.body.10.res_layer.3.weight    \|   589824   \|
148.3s | 254 | \|    encoder.body.10.res_layer.4.weight    \|    256     \|
148.3s | 255 | \|     encoder.body.10.res_layer.4.bias     \|    256     \|
148.3s | 256 | \|  encoder.body.10.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 257 | \|  encoder.body.10.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 258 | \|    encoder.body.11.res_layer.0.weight    \|    256     \|
148.3s | 259 | \|     encoder.body.11.res_layer.0.bias     \|    256     \|
148.3s | 260 | \|    encoder.body.11.res_layer.1.weight    \|   589824   \|
148.3s | 261 | \|    encoder.body.11.res_layer.2.weight    \|    256     \|
148.3s | 262 | \|    encoder.body.11.res_layer.3.weight    \|   589824   \|
148.3s | 263 | \|    encoder.body.11.res_layer.4.weight    \|    256     \|
148.3s | 264 | \|     encoder.body.11.res_layer.4.bias     \|    256     \|
148.3s | 265 | \|  encoder.body.11.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 266 | \|  encoder.body.11.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 267 | \|    encoder.body.12.res_layer.0.weight    \|    256     \|
148.3s | 268 | \|     encoder.body.12.res_layer.0.bias     \|    256     \|
148.3s | 269 | \|    encoder.body.12.res_layer.1.weight    \|   589824   \|
148.3s | 270 | \|    encoder.body.12.res_layer.2.weight    \|    256     \|
148.3s | 271 | \|    encoder.body.12.res_layer.3.weight    \|   589824   \|
148.3s | 272 | \|    encoder.body.12.res_layer.4.weight    \|    256     \|
148.3s | 273 | \|     encoder.body.12.res_layer.4.bias     \|    256     \|
148.3s | 274 | \|  encoder.body.12.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 275 | \|  encoder.body.12.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 276 | \|    encoder.body.13.res_layer.0.weight    \|    256     \|
148.3s | 277 | \|     encoder.body.13.res_layer.0.bias     \|    256     \|
148.3s | 278 | \|    encoder.body.13.res_layer.1.weight    \|   589824   \|
148.3s | 279 | \|    encoder.body.13.res_layer.2.weight    \|    256     \|
148.3s | 280 | \|    encoder.body.13.res_layer.3.weight    \|   589824   \|
148.3s | 281 | \|    encoder.body.13.res_layer.4.weight    \|    256     \|
148.3s | 282 | \|     encoder.body.13.res_layer.4.bias     \|    256     \|
148.3s | 283 | \|  encoder.body.13.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 284 | \|  encoder.body.13.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 285 | \|    encoder.body.14.res_layer.0.weight    \|    256     \|
148.3s | 286 | \|     encoder.body.14.res_layer.0.bias     \|    256     \|
148.3s | 287 | \|    encoder.body.14.res_layer.1.weight    \|   589824   \|
148.3s | 288 | \|    encoder.body.14.res_layer.2.weight    \|    256     \|
148.3s | 289 | \|    encoder.body.14.res_layer.3.weight    \|   589824   \|
148.3s | 290 | \|    encoder.body.14.res_layer.4.weight    \|    256     \|
148.3s | 291 | \|     encoder.body.14.res_layer.4.bias     \|    256     \|
148.3s | 292 | \|  encoder.body.14.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 293 | \|  encoder.body.14.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 294 | \|    encoder.body.15.res_layer.0.weight    \|    256     \|
148.3s | 295 | \|     encoder.body.15.res_layer.0.bias     \|    256     \|
148.3s | 296 | \|    encoder.body.15.res_layer.1.weight    \|   589824   \|
148.3s | 297 | \|    encoder.body.15.res_layer.2.weight    \|    256     \|
148.3s | 298 | \|    encoder.body.15.res_layer.3.weight    \|   589824   \|
148.3s | 299 | \|    encoder.body.15.res_layer.4.weight    \|    256     \|
148.3s | 300 | \|     encoder.body.15.res_layer.4.bias     \|    256     \|
148.3s | 301 | \|  encoder.body.15.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 302 | \|  encoder.body.15.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 303 | \|    encoder.body.16.res_layer.0.weight    \|    256     \|
148.3s | 304 | \|     encoder.body.16.res_layer.0.bias     \|    256     \|
148.3s | 305 | \|    encoder.body.16.res_layer.1.weight    \|   589824   \|
148.3s | 306 | \|    encoder.body.16.res_layer.2.weight    \|    256     \|
148.3s | 307 | \|    encoder.body.16.res_layer.3.weight    \|   589824   \|
148.3s | 308 | \|    encoder.body.16.res_layer.4.weight    \|    256     \|
148.3s | 309 | \|     encoder.body.16.res_layer.4.bias     \|    256     \|
148.3s | 310 | \|  encoder.body.16.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 311 | \|  encoder.body.16.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 312 | \|    encoder.body.17.res_layer.0.weight    \|    256     \|
148.3s | 313 | \|     encoder.body.17.res_layer.0.bias     \|    256     \|
148.3s | 314 | \|    encoder.body.17.res_layer.1.weight    \|   589824   \|
148.3s | 315 | \|    encoder.body.17.res_layer.2.weight    \|    256     \|
148.3s | 316 | \|    encoder.body.17.res_layer.3.weight    \|   589824   \|
148.3s | 317 | \|    encoder.body.17.res_layer.4.weight    \|    256     \|
148.3s | 318 | \|     encoder.body.17.res_layer.4.bias     \|    256     \|
148.3s | 319 | \|  encoder.body.17.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 320 | \|  encoder.body.17.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 321 | \|    encoder.body.18.res_layer.0.weight    \|    256     \|
148.3s | 322 | \|     encoder.body.18.res_layer.0.bias     \|    256     \|
148.3s | 323 | \|    encoder.body.18.res_layer.1.weight    \|   589824   \|
148.3s | 324 | \|    encoder.body.18.res_layer.2.weight    \|    256     \|
148.3s | 325 | \|    encoder.body.18.res_layer.3.weight    \|   589824   \|
148.3s | 326 | \|    encoder.body.18.res_layer.4.weight    \|    256     \|
148.3s | 327 | \|     encoder.body.18.res_layer.4.bias     \|    256     \|
148.3s | 328 | \|  encoder.body.18.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 329 | \|  encoder.body.18.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 330 | \|    encoder.body.19.res_layer.0.weight    \|    256     \|
148.3s | 331 | \|     encoder.body.19.res_layer.0.bias     \|    256     \|
148.3s | 332 | \|    encoder.body.19.res_layer.1.weight    \|   589824   \|
148.3s | 333 | \|    encoder.body.19.res_layer.2.weight    \|    256     \|
148.3s | 334 | \|    encoder.body.19.res_layer.3.weight    \|   589824   \|
148.3s | 335 | \|    encoder.body.19.res_layer.4.weight    \|    256     \|
148.3s | 336 | \|     encoder.body.19.res_layer.4.bias     \|    256     \|
148.3s | 337 | \|  encoder.body.19.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 338 | \|  encoder.body.19.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 339 | \|    encoder.body.20.res_layer.0.weight    \|    256     \|
148.3s | 340 | \|     encoder.body.20.res_layer.0.bias     \|    256     \|
148.3s | 341 | \|    encoder.body.20.res_layer.1.weight    \|   589824   \|
148.3s | 342 | \|    encoder.body.20.res_layer.2.weight    \|    256     \|
148.3s | 343 | \|    encoder.body.20.res_layer.3.weight    \|   589824   \|
148.3s | 344 | \|    encoder.body.20.res_layer.4.weight    \|    256     \|
148.3s | 345 | \|     encoder.body.20.res_layer.4.bias     \|    256     \|
148.3s | 346 | \|  encoder.body.20.res_layer.5.fc1.weight  \|    4096    \|
148.3s | 347 | \|  encoder.body.20.res_layer.5.fc2.weight  \|    4096    \|
148.3s | 348 | \| encoder.body.21.shortcut_layer.0.weight  \|   131072   \|
148.3s | 349 | \| encoder.body.21.shortcut_layer.1.weight  \|    512     \|
148.3s | 350 | \|  encoder.body.21.shortcut_layer.1.bias   \|    512     \|
148.3s | 351 | \|    encoder.body.21.res_layer.0.weight    \|    256     \|
148.3s | 352 | \|     encoder.body.21.res_layer.0.bias     \|    256     \|
148.3s | 353 | \|    encoder.body.21.res_layer.1.weight    \|  1179648   \|
148.3s | 354 | \|    encoder.body.21.res_layer.2.weight    \|    512     \|
148.3s | 355 | \|    encoder.body.21.res_layer.3.weight    \|  2359296   \|
148.3s | 356 | \|    encoder.body.21.res_layer.4.weight    \|    512     \|
148.3s | 357 | \|     encoder.body.21.res_layer.4.bias     \|    512     \|
148.3s | 358 | \|  encoder.body.21.res_layer.5.fc1.weight  \|   16384    \|
148.3s | 359 | \|  encoder.body.21.res_layer.5.fc2.weight  \|   16384    \|
148.3s | 360 | \|    encoder.body.22.res_layer.0.weight    \|    512     \|
148.3s | 361 | \|     encoder.body.22.res_layer.0.bias     \|    512     \|
148.3s | 362 | \|    encoder.body.22.res_layer.1.weight    \|  2359296   \|
148.3s | 363 | \|    encoder.body.22.res_layer.2.weight    \|    512     \|
148.3s | 364 | \|    encoder.body.22.res_layer.3.weight    \|  2359296   \|
148.3s | 365 | \|    encoder.body.22.res_layer.4.weight    \|    512     \|
148.3s | 366 | \|     encoder.body.22.res_layer.4.bias     \|    512     \|
148.3s | 367 | \|  encoder.body.22.res_layer.5.fc1.weight  \|   16384    \|
148.3s | 368 | \|  encoder.body.22.res_layer.5.fc2.weight  \|   16384    \|
148.3s | 369 | \|    encoder.body.23.res_layer.0.weight    \|    512     \|
148.3s | 370 | \|     encoder.body.23.res_layer.0.bias     \|    512     \|
148.3s | 371 | \|    encoder.body.23.res_layer.1.weight    \|  2359296   \|
148.3s | 372 | \|    encoder.body.23.res_layer.2.weight    \|    512     \|
148.3s | 373 | \|    encoder.body.23.res_layer.3.weight    \|  2359296   \|
148.3s | 374 | \|    encoder.body.23.res_layer.4.weight    \|    512     \|
148.3s | 375 | \|     encoder.body.23.res_layer.4.bias     \|    512     \|
148.3s | 376 | \|  encoder.body.23.res_layer.5.fc1.weight  \|   16384    \|
148.3s | 377 | \|  encoder.body.23.res_layer.5.fc2.weight  \|   16384    \|
148.3s | 378 | \|     encoder.styles.0.convs.0.weight      \|  2359296   \|
148.3s | 379 | \|      encoder.styles.0.convs.0.bias       \|    512     \|
148.3s | 380 | \|     encoder.styles.0.convs.2.weight      \|  2359296   \|
148.3s | 381 | \|      encoder.styles.0.convs.2.bias       \|    512     \|
148.3s | 382 | \|     encoder.styles.0.convs.4.weight      \|  2359296   \|
148.3s | 383 | \|      encoder.styles.0.convs.4.bias       \|    512     \|
148.3s | 384 | \|     encoder.styles.0.convs.6.weight      \|  2359296   \|
148.3s | 385 | \|      encoder.styles.0.convs.6.bias       \|    512     \|
148.3s | 386 | \|      encoder.styles.0.linear.weight      \|   262144   \|
148.3s | 387 | \|       encoder.styles.0.linear.bias       \|    512     \|
148.3s | 388 | \|     encoder.styles.1.convs.0.weight      \|  2359296   \|
148.3s | 389 | \|      encoder.styles.1.convs.0.bias       \|    512     \|
148.3s | 390 | \|     encoder.styles.1.convs.2.weight      \|  2359296   \|
148.3s | 391 | \|      encoder.styles.1.convs.2.bias       \|    512     \|
148.3s | 392 | \|     encoder.styles.1.convs.4.weight      \|  2359296   \|
148.3s | 393 | \|      encoder.styles.1.convs.4.bias       \|    512     \|
148.3s | 394 | \|     encoder.styles.1.convs.6.weight      \|  2359296   \|
148.3s | 395 | \|      encoder.styles.1.convs.6.bias       \|    512     \|
148.3s | 396 | \|      encoder.styles.1.linear.weight      \|   262144   \|
148.3s | 397 | \|       encoder.styles.1.linear.bias       \|    512     \|
148.3s | 398 | \|     encoder.styles.2.convs.0.weight      \|  2359296   \|
148.3s | 399 | \|      encoder.styles.2.convs.0.bias       \|    512     \|
148.3s | 400 | \|     encoder.styles.2.convs.2.weight      \|  2359296   \|
148.3s | 401 | \|      encoder.styles.2.convs.2.bias       \|    512     \|
148.3s | 402 | \|     encoder.styles.2.convs.4.weight      \|  2359296   \|
148.3s | 403 | \|      encoder.styles.2.convs.4.bias       \|    512     \|
148.3s | 404 | \|     encoder.styles.2.convs.6.weight      \|  2359296   \|
148.3s | 405 | \|      encoder.styles.2.convs.6.bias       \|    512     \|
148.3s | 406 | \|      encoder.styles.2.linear.weight      \|   262144   \|
148.3s | 407 | \|       encoder.styles.2.linear.bias       \|    512     \|
148.3s | 408 | \|     encoder.styles.3.convs.0.weight      \|  2359296   \|
148.3s | 409 | \|      encoder.styles.3.convs.0.bias       \|    512     \|
148.3s | 410 | \|     encoder.styles.3.convs.2.weight      \|  2359296   \|
148.3s | 411 | \|      encoder.styles.3.convs.2.bias       \|    512     \|
148.3s | 412 | \|     encoder.styles.3.convs.4.weight      \|  2359296   \|
148.3s | 413 | \|      encoder.styles.3.convs.4.bias       \|    512     \|
148.3s | 414 | \|     encoder.styles.3.convs.6.weight      \|  2359296   \|
148.3s | 415 | \|      encoder.styles.3.convs.6.bias       \|    512     \|
148.3s | 416 | \|      encoder.styles.3.linear.weight      \|   262144   \|
148.3s | 417 | \|       encoder.styles.3.linear.bias       \|    512     \|
148.3s | 418 | \|     encoder.styles.4.convs.0.weight      \|  2359296   \|
148.3s | 419 | \|      encoder.styles.4.convs.0.bias       \|    512     \|
148.3s | 420 | \|     encoder.styles.4.convs.2.weight      \|  2359296   \|
148.3s | 421 | \|      encoder.styles.4.convs.2.bias       \|    512     \|
148.3s | 422 | \|     encoder.styles.4.convs.4.weight      \|  2359296   \|
148.3s | 423 | \|      encoder.styles.4.convs.4.bias       \|    512     \|
148.3s | 424 | \|     encoder.styles.4.convs.6.weight      \|  2359296   \|
148.3s | 425 | \|      encoder.styles.4.convs.6.bias       \|    512     \|
148.3s | 426 | \|      encoder.styles.4.linear.weight      \|   262144   \|
148.3s | 427 | \|       encoder.styles.4.linear.bias       \|    512     \|
148.3s | 428 | \|     encoder.styles.5.convs.0.weight      \|  2359296   \|
148.3s | 429 | \|      encoder.styles.5.convs.0.bias       \|    512     \|
148.3s | 430 | \|     encoder.styles.5.convs.2.weight      \|  2359296   \|
148.3s | 431 | \|      encoder.styles.5.convs.2.bias       \|    512     \|
148.3s | 432 | \|     encoder.styles.5.convs.4.weight      \|  2359296   \|
148.3s | 433 | \|      encoder.styles.5.convs.4.bias       \|    512     \|
148.3s | 434 | \|     encoder.styles.5.convs.6.weight      \|  2359296   \|
148.3s | 435 | \|      encoder.styles.5.convs.6.bias       \|    512     \|
148.3s | 436 | \|      encoder.styles.5.linear.weight      \|   262144   \|
148.3s | 437 | \|       encoder.styles.5.linear.bias       \|    512     \|
148.3s | 438 | \|     encoder.styles.6.convs.0.weight      \|  2359296   \|
148.3s | 439 | \|      encoder.styles.6.convs.0.bias       \|    512     \|
148.3s | 440 | \|     encoder.styles.6.convs.2.weight      \|  2359296   \|
148.3s | 441 | \|      encoder.styles.6.convs.2.bias       \|    512     \|
148.3s | 442 | \|     encoder.styles.6.convs.4.weight      \|  2359296   \|
148.3s | 443 | \|      encoder.styles.6.convs.4.bias       \|    512     \|
148.3s | 444 | \|     encoder.styles.6.convs.6.weight      \|  2359296   \|
148.3s | 445 | \|      encoder.styles.6.convs.6.bias       \|    512     \|
148.3s | 446 | \|      encoder.styles.6.linear.weight      \|   262144   \|
148.3s | 447 | \|       encoder.styles.6.linear.bias       \|    512     \|
148.3s | 448 | \|     encoder.styles.7.convs.0.weight      \|  2359296   \|
148.3s | 449 | \|      encoder.styles.7.convs.0.bias       \|    512     \|
148.3s | 450 | \|     encoder.styles.7.convs.2.weight      \|  2359296   \|
148.3s | 451 | \|      encoder.styles.7.convs.2.bias       \|    512     \|
148.3s | 452 | \|     encoder.styles.7.convs.4.weight      \|  2359296   \|
148.3s | 453 | \|      encoder.styles.7.convs.4.bias       \|    512     \|
148.3s | 454 | \|     encoder.styles.7.convs.6.weight      \|  2359296   \|
148.3s | 455 | \|      encoder.styles.7.convs.6.bias       \|    512     \|
148.3s | 456 | \|      encoder.styles.7.linear.weight      \|   262144   \|
148.3s | 457 | \|       encoder.styles.7.linear.bias       \|    512     \|
148.3s | 458 | \|     encoder.styles.8.convs.0.weight      \|  2359296   \|
148.3s | 459 | \|      encoder.styles.8.convs.0.bias       \|    512     \|
148.3s | 460 | \|     encoder.styles.8.convs.2.weight      \|  2359296   \|
148.3s | 461 | \|      encoder.styles.8.convs.2.bias       \|    512     \|
148.3s | 462 | \|     encoder.styles.8.convs.4.weight      \|  2359296   \|
148.3s | 463 | \|      encoder.styles.8.convs.4.bias       \|    512     \|
148.3s | 464 | \|     encoder.styles.8.convs.6.weight      \|  2359296   \|
148.3s | 465 | \|      encoder.styles.8.convs.6.bias       \|    512     \|
148.3s | 466 | \|      encoder.styles.8.linear.weight      \|   262144   \|
148.3s | 467 | \|       encoder.styles.8.linear.bias       \|    512     \|
148.3s | 468 | \|     encoder.styles.9.convs.0.weight      \|  2359296   \|
148.3s | 469 | \|      encoder.styles.9.convs.0.bias       \|    512     \|
148.3s | 470 | \|     encoder.styles.9.convs.2.weight      \|  2359296   \|
148.3s | 471 | \|      encoder.styles.9.convs.2.bias       \|    512     \|
148.3s | 472 | \|     encoder.styles.9.convs.4.weight      \|  2359296   \|
148.3s | 473 | \|      encoder.styles.9.convs.4.bias       \|    512     \|
148.3s | 474 | \|     encoder.styles.9.convs.6.weight      \|  2359296   \|
148.3s | 475 | \|      encoder.styles.9.convs.6.bias       \|    512     \|
148.3s | 476 | \|      encoder.styles.9.linear.weight      \|   262144   \|
148.3s | 477 | \|       encoder.styles.9.linear.bias       \|    512     \|
148.3s | 478 | \|     encoder.styles.10.convs.0.weight     \|  2359296   \|
148.3s | 479 | \|      encoder.styles.10.convs.0.bias      \|    512     \|
148.3s | 480 | \|     encoder.styles.10.convs.2.weight     \|  2359296   \|
148.3s | 481 | \|      encoder.styles.10.convs.2.bias      \|    512     \|
148.3s | 482 | \|     encoder.styles.10.convs.4.weight     \|  2359296   \|
148.3s | 483 | \|      encoder.styles.10.convs.4.bias      \|    512     \|
148.3s | 484 | \|     encoder.styles.10.convs.6.weight     \|  2359296   \|
148.3s | 485 | \|      encoder.styles.10.convs.6.bias      \|    512     \|
148.3s | 486 | \|     encoder.styles.10.linear.weight      \|   262144   \|
148.3s | 487 | \|      encoder.styles.10.linear.bias       \|    512     \|
148.3s | 488 | \|     encoder.styles.11.convs.0.weight     \|  2359296   \|
148.3s | 489 | \|      encoder.styles.11.convs.0.bias      \|    512     \|
148.3s | 490 | \|     encoder.styles.11.convs.2.weight     \|  2359296   \|
148.3s | 491 | \|      encoder.styles.11.convs.2.bias      \|    512     \|
148.3s | 492 | \|     encoder.styles.11.convs.4.weight     \|  2359296   \|
148.3s | 493 | \|      encoder.styles.11.convs.4.bias      \|    512     \|
148.3s | 494 | \|     encoder.styles.11.convs.6.weight     \|  2359296   \|
148.3s | 495 | \|      encoder.styles.11.convs.6.bias      \|    512     \|
148.3s | 496 | \|     encoder.styles.11.linear.weight      \|   262144   \|
148.3s | 497 | \|      encoder.styles.11.linear.bias       \|    512     \|
148.3s | 498 | \|     encoder.styles.12.convs.0.weight     \|  2359296   \|
148.3s | 499 | \|      encoder.styles.12.convs.0.bias      \|    512     \|
148.3s | 500 | \|     encoder.styles.12.convs.2.weight     \|  2359296   \|
148.3s | 501 | \|      encoder.styles.12.convs.2.bias      \|    512     \|
148.3s | 502 | \|     encoder.styles.12.convs.4.weight     \|  2359296   \|
148.3s | 503 | \|      encoder.styles.12.convs.4.bias      \|    512     \|
148.3s | 504 | \|     encoder.styles.12.convs.6.weight     \|  2359296   \|
148.3s | 505 | \|      encoder.styles.12.convs.6.bias      \|    512     \|
148.3s | 506 | \|     encoder.styles.12.linear.weight      \|   262144   \|
148.3s | 507 | \|      encoder.styles.12.linear.bias       \|    512     \|
148.3s | 508 | \|     encoder.styles.13.convs.0.weight     \|  2359296   \|
148.3s | 509 | \|      encoder.styles.13.convs.0.bias      \|    512     \|
148.3s | 510 | \|     encoder.styles.13.convs.2.weight     \|  2359296   \|
148.3s | 511 | \|      encoder.styles.13.convs.2.bias      \|    512     \|
148.3s | 512 | \|     encoder.styles.13.convs.4.weight     \|  2359296   \|
148.3s | 513 | \|      encoder.styles.13.convs.4.bias      \|    512     \|
148.3s | 514 | \|     encoder.styles.13.convs.6.weight     \|  2359296   \|
148.3s | 515 | \|      encoder.styles.13.convs.6.bias      \|    512     \|
148.3s | 516 | \|     encoder.styles.13.linear.weight      \|   262144   \|
148.3s | 517 | \|      encoder.styles.13.linear.bias       \|    512     \|
148.3s | 518 | \|     encoder.styles.14.convs.0.weight     \|  2359296   \|
148.3s | 519 | \|      encoder.styles.14.convs.0.bias      \|    512     \|
148.3s | 520 | \|     encoder.styles.14.convs.2.weight     \|  2359296   \|
148.3s | 521 | \|      encoder.styles.14.convs.2.bias      \|    512     \|
148.3s | 522 | \|     encoder.styles.14.convs.4.weight     \|  2359296   \|
148.3s | 523 | \|      encoder.styles.14.convs.4.bias      \|    512     \|
148.3s | 524 | \|     encoder.styles.14.convs.6.weight     \|  2359296   \|
148.3s | 525 | \|      encoder.styles.14.convs.6.bias      \|    512     \|
148.3s | 526 | \|     encoder.styles.14.linear.weight      \|   262144   \|
148.3s | 527 | \|      encoder.styles.14.linear.bias       \|    512     \|
148.3s | 528 | \|     encoder.styles.15.convs.0.weight     \|  2359296   \|
148.3s | 529 | \|      encoder.styles.15.convs.0.bias      \|    512     \|
148.3s | 530 | \|     encoder.styles.15.convs.2.weight     \|  2359296   \|
148.3s | 531 | \|      encoder.styles.15.convs.2.bias      \|    512     \|
148.3s | 532 | \|     encoder.styles.15.convs.4.weight     \|  2359296   \|
148.3s | 533 | \|      encoder.styles.15.convs.4.bias      \|    512     \|
148.3s | 534 | \|     encoder.styles.15.convs.6.weight     \|  2359296   \|
148.3s | 535 | \|      encoder.styles.15.convs.6.bias      \|    512     \|
148.3s | 536 | \|     encoder.styles.15.linear.weight      \|   262144   \|
148.3s | 537 | \|      encoder.styles.15.linear.bias       \|    512     \|
148.3s | 538 | \|     encoder.styles.16.convs.0.weight     \|  2359296   \|
148.3s | 539 | \|      encoder.styles.16.convs.0.bias      \|    512     \|
148.3s | 540 | \|     encoder.styles.16.convs.2.weight     \|  2359296   \|
148.3s | 541 | \|      encoder.styles.16.convs.2.bias      \|    512     \|
148.3s | 542 | \|     encoder.styles.16.convs.4.weight     \|  2359296   \|
148.3s | 543 | \|      encoder.styles.16.convs.4.bias      \|    512     \|
148.3s | 544 | \|     encoder.styles.16.convs.6.weight     \|  2359296   \|
148.3s | 545 | \|      encoder.styles.16.convs.6.bias      \|    512     \|
148.3s | 546 | \|     encoder.styles.16.linear.weight      \|   262144   \|
148.3s | 547 | \|      encoder.styles.16.linear.bias       \|    512     \|
148.3s | 548 | \|     encoder.styles.17.convs.0.weight     \|  2359296   \|
148.3s | 549 | \|      encoder.styles.17.convs.0.bias      \|    512     \|
148.3s | 550 | \|     encoder.styles.17.convs.2.weight     \|  2359296   \|
148.3s | 551 | \|      encoder.styles.17.convs.2.bias      \|    512     \|
148.3s | 552 | \|     encoder.styles.17.convs.4.weight     \|  2359296   \|
148.3s | 553 | \|      encoder.styles.17.convs.4.bias      \|    512     \|
148.3s | 554 | \|     encoder.styles.17.convs.6.weight     \|  2359296   \|
148.3s | 555 | \|      encoder.styles.17.convs.6.bias      \|    512     \|
148.3s | 556 | \|     encoder.styles.17.linear.weight      \|   262144   \|
148.3s | 557 | \|      encoder.styles.17.linear.bias       \|    512     \|
148.3s | 558 | \|          decoder.style.1.weight          \|   262144   \|
148.3s | 559 | \|           decoder.style.1.bias           \|    512     \|
148.3s | 560 | \|          decoder.style.2.weight          \|   262144   \|
148.3s | 561 | \|           decoder.style.2.bias           \|    512     \|
148.3s | 562 | \|          decoder.style.3.weight          \|   262144   \|
148.3s | 563 | \|           decoder.style.3.bias           \|    512     \|
148.3s | 564 | \|          decoder.style.4.weight          \|   262144   \|
148.3s | 565 | \|           decoder.style.4.bias           \|    512     \|
148.3s | 566 | \|          decoder.style.5.weight          \|   262144   \|
148.3s | 567 | \|           decoder.style.5.bias           \|    512     \|
148.3s | 568 | \|          decoder.style.6.weight          \|   262144   \|
148.3s | 569 | \|           decoder.style.6.bias           \|    512     \|
148.3s | 570 | \|          decoder.style.7.weight          \|   262144   \|
148.3s | 571 | \|           decoder.style.7.bias           \|    512     \|
148.3s | 572 | \|          decoder.style.8.weight          \|   262144   \|
148.3s | 573 | \|           decoder.style.8.bias           \|    512     \|
148.3s | 574 | \|           decoder.input.input            \|    8192    \|
148.3s | 575 | \|        decoder.conv1.conv.weight         \|  2359296   \|
148.3s | 576 | \|   decoder.conv1.conv.modulation.weight   \|   262144   \|
148.3s | 577 | \|    decoder.conv1.conv.modulation.bias    \|    512     \|
148.3s | 578 | \|        decoder.conv1.noise.weight        \|     1      \|
148.3s | 579 | \|       decoder.conv1.activate.bias        \|    512     \|
148.3s | 580 | \|           decoder.to_rgb1.bias           \|     3      \|
148.3s | 581 | \|       decoder.to_rgb1.conv.weight        \|    1536    \|
148.3s | 582 | \|  decoder.to_rgb1.conv.modulation.weight  \|   262144   \|
148.3s | 583 | \|   decoder.to_rgb1.conv.modulation.bias   \|    512     \|
148.3s | 584 | \|       decoder.convs.0.conv.weight        \|  2359296   \|
148.3s | 585 | \|  decoder.convs.0.conv.modulation.weight  \|   262144   \|
148.3s | 586 | \|   decoder.convs.0.conv.modulation.bias   \|    512     \|
148.3s | 587 | \|       decoder.convs.0.noise.weight       \|     1      \|
148.3s | 588 | \|      decoder.convs.0.activate.bias       \|    512     \|
148.3s | 589 | \|       decoder.convs.1.conv.weight        \|  2359296   \|
148.3s | 590 | \|  decoder.convs.1.conv.modulation.weight  \|   262144   \|
148.3s | 591 | \|   decoder.convs.1.conv.modulation.bias   \|    512     \|
148.3s | 592 | \|       decoder.convs.1.noise.weight       \|     1      \|
148.3s | 593 | \|      decoder.convs.1.activate.bias       \|    512     \|
148.3s | 594 | \|       decoder.convs.2.conv.weight        \|  2359296   \|
148.3s | 595 | \|  decoder.convs.2.conv.modulation.weight  \|   262144   \|
148.3s | 596 | \|   decoder.convs.2.conv.modulation.bias   \|    512     \|
148.3s | 597 | \|       decoder.convs.2.noise.weight       \|     1      \|
148.3s | 598 | \|      decoder.convs.2.activate.bias       \|    512     \|
148.3s | 599 | \|       decoder.convs.3.conv.weight        \|  2359296   \|
148.3s | 600 | \|  decoder.convs.3.conv.modulation.weight  \|   262144   \|
148.3s | 601 | \|   decoder.convs.3.conv.modulation.bias   \|    512     \|
148.3s | 602 | \|       decoder.convs.3.noise.weight       \|     1      \|
148.3s | 603 | \|      decoder.convs.3.activate.bias       \|    512     \|
148.3s | 604 | \|       decoder.convs.4.conv.weight        \|  2359296   \|
148.3s | 605 | \|  decoder.convs.4.conv.modulation.weight  \|   262144   \|
148.3s | 606 | \|   decoder.convs.4.conv.modulation.bias   \|    512     \|
148.3s | 607 | \|       decoder.convs.4.noise.weight       \|     1      \|
148.3s | 608 | \|      decoder.convs.4.activate.bias       \|    512     \|
148.3s | 609 | \|       decoder.convs.5.conv.weight        \|  2359296   \|
148.3s | 610 | \|  decoder.convs.5.conv.modulation.weight  \|   262144   \|
148.3s | 611 | \|   decoder.convs.5.conv.modulation.bias   \|    512     \|
148.3s | 612 | \|       decoder.convs.5.noise.weight       \|     1      \|
148.3s | 613 | \|      decoder.convs.5.activate.bias       \|    512     \|
148.3s | 614 | \|       decoder.convs.6.conv.weight        \|  2359296   \|
148.3s | 615 | \|  decoder.convs.6.conv.modulation.weight  \|   262144   \|
148.3s | 616 | \|   decoder.convs.6.conv.modulation.bias   \|    512     \|
148.3s | 617 | \|       decoder.convs.6.noise.weight       \|     1      \|
148.3s | 618 | \|      decoder.convs.6.activate.bias       \|    512     \|
148.3s | 619 | \|       decoder.convs.7.conv.weight        \|  2359296   \|
148.3s | 620 | \|  decoder.convs.7.conv.modulation.weight  \|   262144   \|
148.3s | 621 | \|   decoder.convs.7.conv.modulation.bias   \|    512     \|
148.3s | 622 | \|       decoder.convs.7.noise.weight       \|     1      \|
148.3s | 623 | \|      decoder.convs.7.activate.bias       \|    512     \|
148.3s | 624 | \|       decoder.convs.8.conv.weight        \|  1179648   \|
148.3s | 625 | \|  decoder.convs.8.conv.modulation.weight  \|   262144   \|
148.3s | 626 | \|   decoder.convs.8.conv.modulation.bias   \|    512     \|
148.3s | 627 | \|       decoder.convs.8.noise.weight       \|     1      \|
148.3s | 628 | \|      decoder.convs.8.activate.bias       \|    256     \|
148.3s | 629 | \|       decoder.convs.9.conv.weight        \|   589824   \|
148.3s | 630 | \|  decoder.convs.9.conv.modulation.weight  \|   131072   \|
148.3s | 631 | \|   decoder.convs.9.conv.modulation.bias   \|    256     \|
148.3s | 632 | \|       decoder.convs.9.noise.weight       \|     1      \|
148.3s | 633 | \|      decoder.convs.9.activate.bias       \|    256     \|
148.3s | 634 | \|       decoder.convs.10.conv.weight       \|   294912   \|
148.3s | 635 | \| decoder.convs.10.conv.modulation.weight  \|   131072   \|
148.3s | 636 | \|  decoder.convs.10.conv.modulation.bias   \|    256     \|
148.3s | 637 | \|      decoder.convs.10.noise.weight       \|     1      \|
148.3s | 638 | \|      decoder.convs.10.activate.bias      \|    128     \|
148.3s | 639 | \|       decoder.convs.11.conv.weight       \|   147456   \|
148.3s | 640 | \| decoder.convs.11.conv.modulation.weight  \|   65536    \|
148.3s | 641 | \|  decoder.convs.11.conv.modulation.bias   \|    128     \|
148.3s | 642 | \|      decoder.convs.11.noise.weight       \|     1      \|
148.3s | 643 | \|      decoder.convs.11.activate.bias      \|    128     \|
148.3s | 644 | \|       decoder.convs.12.conv.weight       \|   73728    \|
148.3s | 645 | \| decoder.convs.12.conv.modulation.weight  \|   65536    \|
148.3s | 646 | \|  decoder.convs.12.conv.modulation.bias   \|    128     \|
148.3s | 647 | \|      decoder.convs.12.noise.weight       \|     1      \|
148.3s | 648 | \|      decoder.convs.12.activate.bias      \|     64     \|
148.3s | 649 | \|       decoder.convs.13.conv.weight       \|   36864    \|
148.3s | 650 | \| decoder.convs.13.conv.modulation.weight  \|   32768    \|
148.3s | 651 | \|  decoder.convs.13.conv.modulation.bias   \|     64     \|
148.3s | 652 | \|      decoder.convs.13.noise.weight       \|     1      \|
148.3s | 653 | \|      decoder.convs.13.activate.bias      \|     64     \|
148.3s | 654 | \|       decoder.convs.14.conv.weight       \|   18432    \|
148.3s | 655 | \| decoder.convs.14.conv.modulation.weight  \|   32768    \|
148.3s | 656 | \|  decoder.convs.14.conv.modulation.bias   \|     64     \|
148.3s | 657 | \|      decoder.convs.14.noise.weight       \|     1      \|
148.3s | 658 | \|      decoder.convs.14.activate.bias      \|     32     \|
148.3s | 659 | \|       decoder.convs.15.conv.weight       \|    9216    \|
148.3s | 660 | \| decoder.convs.15.conv.modulation.weight  \|   16384    \|
148.3s | 661 | \|  decoder.convs.15.conv.modulation.bias   \|     32     \|
148.3s | 662 | \|      decoder.convs.15.noise.weight       \|     1      \|
148.3s | 663 | \|      decoder.convs.15.activate.bias      \|     32     \|
148.3s | 664 | \|          decoder.to_rgbs.0.bias          \|     3      \|
148.3s | 665 | \|      decoder.to_rgbs.0.conv.weight       \|    1536    \|
148.3s | 666 | \| decoder.to_rgbs.0.conv.modulation.weight \|   262144   \|
148.3s | 667 | \|  decoder.to_rgbs.0.conv.modulation.bias  \|    512     \|
148.3s | 668 | \|          decoder.to_rgbs.1.bias          \|     3      \|
148.3s | 669 | \|      decoder.to_rgbs.1.conv.weight       \|    1536    \|
148.3s | 670 | \| decoder.to_rgbs.1.conv.modulation.weight \|   262144   \|
148.3s | 671 | \|  decoder.to_rgbs.1.conv.modulation.bias  \|    512     \|
148.3s | 672 | \|          decoder.to_rgbs.2.bias          \|     3      \|
148.3s | 673 | \|      decoder.to_rgbs.2.conv.weight       \|    1536    \|
148.3s | 674 | \| decoder.to_rgbs.2.conv.modulation.weight \|   262144   \|
148.3s | 675 | \|  decoder.to_rgbs.2.conv.modulation.bias  \|    512     \|
148.3s | 676 | \|          decoder.to_rgbs.3.bias          \|     3      \|
148.3s | 677 | \|      decoder.to_rgbs.3.conv.weight       \|    1536    \|
148.3s | 678 | \| decoder.to_rgbs.3.conv.modulation.weight \|   262144   \|
148.3s | 679 | \|  decoder.to_rgbs.3.conv.modulation.bias  \|    512     \|
148.3s | 680 | \|          decoder.to_rgbs.4.bias          \|     3      \|
148.3s | 681 | \|      decoder.to_rgbs.4.conv.weight       \|    768     \|
148.3s | 682 | \| decoder.to_rgbs.4.conv.modulation.weight \|   131072   \|
148.3s | 683 | \|  decoder.to_rgbs.4.conv.modulation.bias  \|    256     \|
148.3s | 684 | \|          decoder.to_rgbs.5.bias          \|     3      \|
148.3s | 685 | \|      decoder.to_rgbs.5.conv.weight       \|    384     \|
148.3s | 686 | \| decoder.to_rgbs.5.conv.modulation.weight \|   65536    \|
148.3s | 687 | \|  decoder.to_rgbs.5.conv.modulation.bias  \|    128     \|
148.3s | 688 | \|          decoder.to_rgbs.6.bias          \|     3      \|
148.3s | 689 | \|      decoder.to_rgbs.6.conv.weight       \|    192     \|
148.3s | 690 | \| decoder.to_rgbs.6.conv.modulation.weight \|   32768    \|
148.3s | 691 | \|  decoder.to_rgbs.6.conv.modulation.bias  \|     64     \|
148.3s | 692 | \|          decoder.to_rgbs.7.bias          \|     3      \|
148.3s | 693 | \|      decoder.to_rgbs.7.conv.weight       \|     96     \|
148.3s | 694 | \| decoder.to_rgbs.7.conv.modulation.weight \|   16384    \|
148.3s | 695 | \|  decoder.to_rgbs.7.conv.modulation.bias  \|     32     \|
148.3s | 696 | +------------------------------------------+------------+
148.3s | 697 | Total Trainable Params: 235955852
154.1s | 698 | Downloading: "https://download.pytorch.org/models/alexnet-owt-7be5be79.pth" to /root/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth
162.8s | 699 | 0%\|                                                \| 0.00/233M [00:00<?, ?B/s]   0%\|                                        \| 48.0k/233M [00:00<08:53, 458kB/s]   0%\|                                         \| 128k/233M [00:00<06:30, 626kB/s]   0%\|                                        \| 288k/233M [00:00<03:51, 1.05MB/s]   0%\|                                        \| 592k/233M [00:00<02:14, 1.82MB/s]   1%\|▏                                      \| 1.19M/233M [00:00<01:11, 3.40MB/s]   1%\|▍                                      \| 2.47M/233M [00:00<00:35, 6.72MB/s]   2%\|▊                                      \| 5.03M/233M [00:00<00:18, 13.2MB/s]   3%\|█▎                                     \| 8.09M/233M [00:00<00:12, 19.1MB/s]   5%\|█▊                                     \| 11.2M/233M [00:00<00:10, 23.1MB/s]   6%\|██▍                                    \| 14.2M/233M [00:01<00:08, 25.8MB/s]   7%\|██▉                                    \| 17.3M/233M [00:01<00:08, 27.6MB/s]   9%\|███▍                                   \| 20.2M/233M [00:01<00:07, 28.4MB/s]  10%\|███▉                                   \| 23.2M/233M [00:01<00:07, 29.5MB/s]  11%\|████▍                                  \| 26.3M/233M [00:01<00:07, 30.2MB/s]  13%\|████▉                                  \| 29.4M/233M [00:01<00:06, 30.7MB/s]  14%\|█████▍                                 \| 32.4M/233M [00:01<00:06, 31.1MB/s]  15%\|█████▉                                 \| 35.5M/233M [00:01<00:06, 31.4MB/s]  17%\|██████▍                                \| 38.5M/233M [00:01<00:06, 31.5MB/s]  18%\|██████▉                                \| 41.6M/233M [00:01<00:06, 31.6MB/s]  19%\|███████▍                               \| 44.6M/233M [00:02<00:06, 31.6MB/s]  20%\|███████▉                               \| 47.7M/233M [00:02<00:06, 31.7MB/s]  22%\|████████▍                              \| 50.8M/233M [00:02<00:06, 31.8MB/s]  23%\|█████████                              \| 53.8M/233M [00:02<00:05, 31.8MB/s]  24%\|█████████▌                             \| 56.9M/233M [00:02<00:05, 31.8MB/s]  26%\|██████████                             \| 59.9M/233M [00:02<00:05, 31.9MB/s]  27%\|██████████▌                            \| 63.0M/233M [00:02<00:05, 31.8MB/s]  28%\|███████████                            \| 66.1M/233M [00:02<00:05, 31.9MB/s]  30%\|███████████▌                           \| 69.1M/233M [00:02<00:05, 31.9MB/s]  31%\|████████████                           \| 72.2M/233M [00:02<00:05, 31.9MB/s]  32%\|████████████▌                          \| 75.2M/233M [00:03<00:05, 31.8MB/s]  34%\|█████████████                          \| 78.3M/233M [00:03<00:05, 31.7MB/s]  35%\|█████████████▌                         \| 81.3M/233M [00:03<00:05, 31.7MB/s]  36%\|██████████████                         \| 84.4M/233M [00:03<00:04, 31.8MB/s]  37%\|██████████████▌                        \| 87.4M/233M [00:03<00:04, 31.8MB/s]  39%\|███████████████▏                       \| 90.5M/233M [00:03<00:04, 31.9MB/s]  40%\|███████████████▋                       \| 93.6M/233M [00:03<00:04, 31.9MB/s]  41%\|████████████████▏                      \| 96.7M/233M [00:03<00:04, 32.0MB/s]  43%\|████████████████▋                      \| 99.8M/233M [00:03<00:04, 32.1MB/s]  44%\|█████████████████▋                      \| 103M/233M [00:03<00:04, 32.1MB/s]  45%\|██████████████████▏                     \| 106M/233M [00:04<00:04, 32.0MB/s]  47%\|██████████████████▋                     \| 109M/233M [00:04<00:04, 32.0MB/s]  48%\|███████████████████▏                    \| 112M/233M [00:04<00:04, 31.7MB/s]  49%\|███████████████████▋                    \| 115M/233M [00:04<00:03, 31.7MB/s]  51%\|████████████████████▎                   \| 118M/233M [00:04<00:03, 31.7MB/s]  52%\|████████████████████▊                   \| 121M/233M [00:04<00:03, 31.8MB/s]  53%\|█████████████████████▎                  \| 124M/233M [00:04<00:03, 31.8MB/s]  55%\|█████████████████████▊                  \| 127M/233M [00:04<00:03, 31.9MB/s]  56%\|██████████████████████▍                 \| 130M/233M [00:04<00:03, 32.0MB/s]  57%\|██████████████████████▉                 \| 133M/233M [00:04<00:03, 32.0MB/s]  59%\|███████████████████████▍                \| 136M/233M [00:05<00:03, 32.0MB/s]  60%\|███████████████████████▉                \| 140M/233M [00:05<00:03, 32.0MB/s]  61%\|████████████████████████▍               \| 143M/233M [00:05<00:02, 32.1MB/s]  63%\|█████████████████████████               \| 146M/233M [00:05<00:02, 32.0MB/s]  64%\|█████████████████████████▌              \| 149M/233M [00:05<00:02, 32.0MB/s]  65%\|██████████████████████████              \| 152M/233M [00:05<00:02, 32.0MB/s]  66%\|██████████████████████████▌             \| 155M/233M [00:05<00:02, 31.8MB/s]  68%\|███████████████████████████             \| 158M/233M [00:05<00:02, 31.8MB/s]  69%\|███████████████████████████▋            \| 161M/233M [00:05<00:02, 31.8MB/s]  70%\|████████████████████████████▏           \| 164M/233M [00:05<00:02, 31.9MB/s]  72%\|████████████████████████████▋           \| 167M/233M [00:06<00:02, 32.0MB/s]  73%\|█████████████████████████████▏          \| 170M/233M [00:06<00:02, 32.0MB/s]  74%\|█████████████████████████████▋          \| 173M/233M [00:06<00:01, 31.9MB/s]  76%\|██████████████████████████████▎         \| 176M/233M [00:06<00:01, 31.7MB/s]  77%\|██████████████████████████████▊         \| 179M/233M [00:06<00:01, 31.7MB/s]  78%\|███████████████████████████████▎        \| 182M/233M [00:06<00:01, 31.8MB/s]  80%\|███████████████████████████████▊        \| 185M/233M [00:06<00:01, 31.8MB/s]  81%\|████████████████████████████████▎       \| 189M/233M [00:06<00:01, 31.9MB/s]  82%\|████████████████████████████████▉       \| 192M/233M [00:06<00:01, 32.0MB/s]  84%\|█████████████████████████████████▍      \| 195M/233M [00:06<00:01, 32.0MB/s]  85%\|█████████████████████████████████▉      \| 198M/233M [00:07<00:01, 32.1MB/s]  86%\|██████████████████████████████████▍     \| 201M/233M [00:07<00:01, 32.1MB/s]  87%\|██████████████████████████████████▉     \| 204M/233M [00:07<00:00, 32.1MB/s]  89%\|███████████████████████████████████▌    \| 207M/233M [00:07<00:00, 32.0MB/s]  90%\|████████████████████████████████████    \| 210M/233M [00:07<00:00, 31.7MB/s]  91%\|████████████████████████████████████▌   \| 213M/233M [00:07<00:00, 31.7MB/s]  93%\|█████████████████████████████████████   \| 216M/233M [00:07<00:00, 31.7MB/s]  94%\|█████████████████████████████████████▌  \| 219M/233M [00:07<00:00, 31.7MB/s]  95%\|██████████████████████████████████████▏ \| 222M/233M [00:07<00:00, 31.9MB/s]  97%\|██████████████████████████████████████▋ \| 225M/233M [00:07<00:00, 31.9MB/s]  98%\|███████████████████████████████████████▏\| 228M/233M [00:08<00:00, 31.9MB/s]  99%\|███████████████████████████████████████▋\| 231M/233M [00:08<00:00, 32.0MB/s] 100%\|████████████████████████████████████████\| 233M/233M [00:08<00:00, 29.7MB/s]
163.0s | 700 | Downloading: "https://raw.githubusercontent.com/richzhang/PerceptualSimilarity/master/lpips/weights/v0.1/alex.pth" to /root/.cache/torch/hub/checkpoints/alex.pth
163.3s | 701 | 0%\|                                               \| 0.00/5.87k [00:00<?, ?B/s] 100%\|██████████████████████████████████████\| 5.87k/5.87k [00:00<00:00, 4.88MB/s]
163.3s | 702 | Loading ResNet ArcFace
163.9s | 703 | Loading dataset for my_ffhq_encode
311.9s | 704 | Number of training samples: 70000
311.9s | 705 | Number of test samples: 70000
316.0s | 706 | Changed progressive stage to:  ProgressiveStage.WTraining
334.0s | 707 | ./training/ranger.py:123: UserWarning: This overload of addcmul_ is deprecated:
334.0s | 708 | addcmul_(Number value, Tensor tensor1, Tensor tensor2)
334.0s | 709 | Consider using one of the following signatures instead:
334.0s | 710 | addcmul_(Tensor tensor1, Tensor tensor2, *, Number value) (Triggered internally at  /usr/local/src/pytorch/torch/csrc/utils/python_arg_parser.cpp:1025.)
334.0s | 711 | exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
335.1s | 712 | Metrics for train, step 0
335.1s | 713 | d_real_loss =  0.6707150936126709
335.1s | 714 | d_fake_loss =  0.7088415622711182
335.1s | 715 | discriminator_loss =  1.379556655883789
335.1s | 716 | discriminator_r1_loss =  0.11644786596298218
335.1s | 717 | encoder_discriminator_loss =  0.6753013730049133
335.1s | 718 | total_delta_loss =  0.0
335.1s | 719 | loss_id =  0.9233516454696655
335.1s | 720 | id_improve =  -0.9233517416287214
335.1s | 721 | loss_l2 =  0.29226595163345337
335.1s | 722 | loss_lpips =  0.4791472554206848
335.1s | 723 | loss =  0.8354490995407104
1150.6s | 724 | Metrics for train, step 50
1150.6s | 725 | d_real_loss =  0.5728033781051636
1150.6s | 726 | d_fake_loss =  0.7072099447250366
1150.6s | 727 | discriminator_loss =  1.2800133228302002
1150.6s | 728 | encoder_discriminator_loss =  0.6189665794372559
1150.6s | 729 | total_delta_loss =  0.0
1150.6s | 730 | loss_id =  1.0349425077438354
1150.6s | 731 | id_improve =  -1.0349424059968442
1150.6s | 732 | loss_l2 =  0.3499150276184082
1150.6s | 733 | loss_lpips =  0.5180467963218689
1150.6s | 734 | loss =  0.9297434091567993
1920.9s | 735 | Metrics for train, step 100
1920.9s | 736 | d_real_loss =  0.46433788537979126
1920.9s | 737 | d_fake_loss =  0.7012063264846802
1920.9s | 738 | discriminator_loss =  1.1655442714691162
1920.9s | 739 | encoder_discriminator_loss =  0.5394962430000305
1920.9s | 740 | total_delta_loss =  0.0
1920.9s | 741 | loss_id =  0.9708028435707092
1920.9s | 742 | id_improve =  -0.9708028591703624
1920.9s | 743 | loss_l2 =  0.27206945419311523
1920.9s | 744 | loss_lpips =  0.4531230926513672
1920.9s | 745 | loss =  0.7855978608131409

Question about training restyle on a different dataset

Hello again.

Recently, I've decided to train your Restyle on another dataset. Question is, do i need to train StyleGan or any other model your Restyle is using on the same dataset?

Thanks in advance for the answer.

opened by DaddyWesker 16

Error when loading converted ada-pytorch model

Hello and thank you for sharing your fascinating work!

I'm trying to use restyle with a pretrained stylegan-ada-pytorch model. I followed the conversion script (thanks btw!) and have my .pt model file ready. Unfortunately, when I'm trying to run training using the following command

python scripts/train_restyle_psp.py --dataset_type=buildings --encoder_type=BackboneEncoder --exp_dir=experiment/restyle_psp_ffhq_encode --workers=8 --batch_size=8 --test_batch_size=8 --test_workers=8 --val_interval=5000 --save_interval=10000 --start_from_latent_avg --lpips_lambda=0.8 --l2_lambda=1 --w_norm_lambda=0 --id_lambda=0.1 --input_nc=6 --n_iters_per_batch=5 --output_size=512 --stylegan_weights=F:\Experimentation\Generative_models\GANs\StyleGAN2\pretrained_models\rosalinity\buildings_5kimg_upsampled.pt

I get the following error when loading the pretrained model

  File "scripts/train_restyle_psp.py", line 30, in <module>
    main()
  File "scripts/train_restyle_psp.py", line 25, in main
    coach = Coach(opts)
  File "F:\Experimentation\Generative_models\GANs\StyleGAN2\GAN_editing\restyle-encoder\training\coach_restyle_psp.py", line 31, in __init__
    self.net = pSp(self.opts).to(self.device)
  File "F:\Experimentation\Generative_models\GANs\StyleGAN2\GAN_editing\restyle-encoder\models\psp.py", line 25, in __init__
    self.load_weights()
  File "F:\Experimentation\Generative_models\GANs\StyleGAN2\GAN_editing\restyle-encoder\models\psp.py", line 52, in load_weights
    self.decoder.load_state_dict(ckpt['g_ema'], strict=True)
  File "C:\Users\user\miniconda3\envs\archelites\lib\site-packages\torch\nn\modules\module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generator:
        Missing key(s) in state_dict: "style.3.weight", "style.3.bias", "style.4.weight", "style.4.bias", "style.5.weight", "style.5.bias", "style.6.weight", "style.6.bias", "style.7.weight", "style.7.bias", "style.8.weight", "style.8.bias".```

Any idea why I get a mismatch here? Thanks!

opened by TheodoreGalanos 13

The model collapse when use moco loss

Hi Yuval,

When I train my datasets with the configuration you recommended, the model seems to collapse after just a few iterations(As shown below). However, when I set the moco lambda as 0.0001, there is no problem so far. So do you have any clue why that happens? Is there a way to disable the moco loss totally? (when I set the moco loss as 0, it seems you bind the id_logs with the moco_loss calculation)

opened by tommy-qichang 12
Improving toonification result

Hi, I was wondering what can we do to improve the toonification result. I tested with Encoder Bootstrapping method, Using the following command : python scripts/encoder_bootstrapping_inference.py --exp_dir=./toonify --model_1_checkpoint_path=./pretrained/restyle_psp_ffhq_encode.pt --model_2_checkpoint_path=./pretrained/restyle_psp_toonify.pt --data_path=./test/test_A --test_batch_size=1 --test_workers=1 --n_iters_per_batch=1

I get decent results, But would like to make it look more like the input image.

A sample of result I am getting.

opened by Nerdyvedi 12
failed to run colab notebook

Downloading ReStyle model for toonify...

ValueError Traceback (most recent call last) in () 4 # if google drive receives too many requests, we'll reach the quota limit and be unable to download the model 5 if os.path.getsize(EXPERIMENT_ARGS['model_path']) < 1000000: ----> 6 raise ValueError("Pretrained model was unable to be downloaded correctly!") 7 else: 8 print('Done.')

ValueError: Pretrained model was unable to be downloaded correctly!

opened by Mustafa744 11
Is the training weight valid only for the training set？

Hello， when I used theinference_iterative.py base on restyle_e4e_ffhq_encode.pt, I found that it was only effective for the images in the training set such as ffhq, but not good for some other faces.

opened by sky-fly97 10
Inference on trained model does not reflect test or train output during model training

After training an image translation model (similar to toonify translation of real face to toonified face using a blended network) with paired images, the train output images look good. The test ones are hit and miss. When using the best_model or the latest model with inference, both output essentially the latent average, not the appropriate translated image (which I have obtained via optimization via ffhq model and then passing the latent through the blended network.

I've tried more iterations per batch when running inference, but that doesn't seem to really change the output. Justin Pinkney used pix2pixHD to do the direct image translation for his toonification example. Does that approach not work using restyle-psp instead of pix2pixHD?

For inference, i'm using scripts/infererence_iterative_save_coupled.py and scripts/inference_iterative.py

opened by dboshardy 9
Generate the image according to the latent code

Hello, after I realize the latent code editing, I want to get the edited image. I would like to ask whether it can be achieved by using this code. I would like to ask what my latent code format should be like.My current latent code format is (1, 18, 512) shape, numpy.ndarray, looking forward to your reply my code about Generate image by latent code，I do the latent code editing based on StyleGan2

opened by zhanghongyong123456 9

Trains for infinite time duration regardless of `max_steps`

opened by RahulBhalley 7

Image-to-Image Translation using ReStyle

Hey! First of all, great work! I just wanted to ask whether there is any documentation for trying the ReStyle Encoder for Image-to-Image Translation? I am working on generating real images from sketches in a non-facial domain and have already tried the vanilla psp image-to-image translation pipeline. I saw your comment here saying that the ReStyle Encoder is better suited for non-facial domain and thus wanted to try it. I have already tried setting the source and the target in the data config as the folder to sketch and real images respectively. It generates images as attached below. Shouldn't the input here be a sketch image? Thanks in advance!

opened by abhisheklalwani 7
The visual change/delta at the last iteration is larger than the preceding steps

I observed that in many images the change between the last image and the one before is significant (sometimes better or worse as far as the human eye. This is for the FFHQ restylepSp pretrained. the tqo examples below from CELEBHQ (10.jpg and 1000.jpg). Why would this be the case? (I am using 20 iters below)

thanks

opened by yaseryacoob 7
how to express sadness

Hello! Thank you for your research, Because I know little about the GAN network, I have a question. I want to express more expressions, such as sadness. How can I do this? Thank you for your help

opened by wudidecc 0

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

Related tags

Overview

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement

Description

Getting Started

Prerequisites

Installation

Pretrained Models

ReStyle + pSp

ReStyle + e4e

Auxiliary Models

Training

Preparing your Data

Preparing your Generator

Training ReStyle

Additional Notes:

Inference Notebook

Testing

Inference

Step-by-Step Inference

Computing Metrics

Encoder Bootstrapping

Repository structure

Credits

Acknowledgments

Citation

Comments

Downloading ReStyle model for toonify...

Owner

Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

PyTorch implementation of Asymmetric Siamese (https://arxiv.org/abs/2204.00613)

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

https://arxiv.org/abs/2102.11005

Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)