[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)

Snap Research

Last update: Dec 9, 2022

Related tags

Overview

CAT

Pytorch implementation of our method for compressing image-to-image models.
Teachers Do More Than Teach: Compressing Image-to-Image Models
Qing Jin¹, Jian Ren², Oliver J. Woodford, Jiazhuo Wang², Geng Yuan¹, Yanzhi Wang¹, Sergey Tulyakov²
¹Northeastern University, ²Snap Inc.
In CVPR 2021.

Overview

Compression And Teaching (CAT) framework for compressing image-to-image models: ① Given a pre-trained teacher generator Gt, we determine the architecture of a compressed student generator Gs by eliminating those channels with smallest magnitudes of batch norm scaling factors. ② We then distill knowledge from the pretrained teacher Gt on the student Gs via a novel distillation technique, which maximize the similarity between features of both generators, defined in terms of kernel alignment (KA).

Prerequisites

Linux
Python 3
CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

Clone this repo:

git clone [email protected]:snap-research/CAT.git
cd CAT

Install PyTorch 1.7 and other dependencies (e.g., torchvision).
- For pip users, please type the command pip install -r requirements.txt.
- For Conda users, please create a new Conda environment using conda env create -f environment.yml.

Data Preparation

CycleGAN

Setup

Download the CycleGAN dataset (e.g., horse2zebra).

bash datasets/download_cyclegan_dataset.sh horse2zebra

Get the statistical information for the ground-truth images for your dataset to compute FID. We provide pre-prepared real statistic information for several datasets on Google Drive Folder.

Pix2pix

Setup

Download the pix2pix dataset (e.g., cityscapes).

bash datasets/download_pix2pix_dataset.sh cityscapes

Cityscapes Dataset

For the Cityscapes dataset, we cannot provide it due to license issue. Please download the dataset from https://cityscapes-dataset.com and use the script prepare_cityscapes_dataset.py to preprocess it. You need to download gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip and unzip them in the same folder. For example, you may put gtFine and leftImg8bit in database/cityscapes-origin. You need to prepare the dataset with the following commands:

python datasets/get_trainIds.py database/cityscapes-origin/gtFine/
python datasets/prepare_cityscapes_dataset.py \
--gtFine_dir database/cityscapes-origin/gtFine \
--leftImg8bit_dir database/cityscapes-origin/leftImg8bit \
--output_dir database/cityscapes \
--table_path datasets/table.txt

You will get a preprocessed dataset in database/cityscapes and a mapping table (used to compute mIoU) in dataset/table.txt.

Get the statistical information for the ground-truth images for your dataset to compute FID. We provide pre-prepared real statistics for several datasets. For example,
```
bash datasets/download_real_stat.sh cityscapes A
```

Evaluation Preparation

mIoU Computation

To support mIoU computation, you need to download a pre-trained DRN model drn-d-105_ms_cityscapes.pth from http://go.yf.io/drn-cityscapes-models. By default, we put the drn model in the root directory of our repo. Then you can test our compressed models on cityscapes after you have downloaded our compressed models.

FID/KID Computation

To compute the FID/KID score, you need to get some statistical information from the groud-truth images of your dataset. We provide a script get_real_stat.py to extract statistical information. For example, for the map2arial dataset, you could run the following command:

python get_real_stat.py \
--dataroot database/map2arial \
--output_path real_stat/maps_B.npz \
--direction AtoB

For paired image-to-image translation (pix2pix and GauGAN), we calculate the FID between generated test images to real test images. For unpaired image-to-image translation (CycleGAN), we calculate the FID between generated test images to real training+test images. This allows us to use more images for a stable FID evaluation, as done in previous unconditional GANs research. The difference of the two protocols is small. The FID of our compressed CycleGAN model increases by 4 when using real test images instead of real training+test images.

KID is not supported for the cityscapes dataset.

Model Training

Teacher Training

The first step of our framework is to train a teacher model. For this purpose, please run the script train_inception_teacher.sh under the correponding folder named as the dataset, for example, run

bash scripts/cycle_gan/horse2zebra/train_inception_teacher.sh

Student Training

With the pretrained teacher model, we can determine the architecture of student model under prescribed computational budget. For this purpose, please run the script train_inception_student_XXX.sh under the correponding folder named as the dataset, where XXX stands for the computational budget (in terms of FLOPs for this case) and can be different for different datasets and models. For example, for CycleGAN with Horse2Zebra dataset, our computational budget is 2.6B FLOPs, so we run

bash scripts/cycle_gan/horse2zebra/train_inception_student_2p6B.sh

Pre-trained Models

For convenience, we also provide pretrained teacher and student models on Google Drive Folder.

Model Evaluation

With pretrained teacher and student models, we can evaluate them on the dataset. For this purpose, please run the script evaluate_inception_student_XXX.sh under the corresponding folder named as the dataset, where XXX is the computational budget (in terms of FLOPs). For example, for CycleGAN with Horse2Zebra dataset where the computational budget is 2.6B FLOPs, please run

bash scripts/cycle_gan/horse2zebra/evaluate_inception_student_2p6B.sh

Model Export

The final step is to export the trained compressed model as onnx file to run on mobile devices. For this purpose, please run the script onnx_export_inception_student_XXX.sh under the corresponding folder named as the dataset, where XXX is the computational budget (in terms of FLOPs). For example, for CycleGAN with Horse2Zebra dataset where the computational budget is 2.6B FLOPs, please run

bash scripts/cycle_gan/horse2zebra/onnx_export_inception_student_2p6B.sh

This will create one .onnx file in addition to log files.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{jin2021teachers,
  title={Teachers Do More Than Teach: Compressing Image-to-Image Models},
  author={Jin, Qing and Ren, Jian and Woodford, Oliver J and Wang, Jiazhuo and Yuan, Geng and Wang, Yanzhi and Tulyakov, Sergey},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Acknowledgements

Our code is developed based on AtomNAS and gan-compression.

We also thank pytorch-fid for FID computation and drn for mIoU computation.

Comments

`get_real_stat.py` for custom dataset
Hello,

I am testing CycleGAN as the readme documents. It seems I need npz files for training, and when I see the command here

!python get_real_stat.py \ --dataroot database/horse2zebra \ --output_path real_stat/horse2zebra_A.npz \ --direction AtoB

and when I see the flag --dataroot it requires trainA, trainB, valA, valB and plus train, val, etc folders .

parser.add_argument( '--dataroot', required=True, help= 'path to images (should have subfolders trainA, trainB, valA, valB, train, val, etc)' )

I see CycleGAN requires usually 4 folders like trainA, trainB, testA(valA), testB(valB) but I am not sure what train, val, etc folders are for. Could you explain how to set those folders? or is there a way to train without .npz files?
opened by youjinChung 10
training question

Hello again,

I am training CycleGAN with ffhq dataset, trying to make a filter that makes the user smile. I checked with my training recently, but the result wasn't satisfying. So, I'd like to ask for advice from you to enhance the performance of the model.

Initially, I trained with 2500 unaligned pairs of smiling/not smiling 1024x1024 photos and got fid_B: 88.244 with --nepochs 500 --nepochs_decay 500 While training the teacher model, I found the best model is found on epoch 181 / 1000. I kinda kept training till the 841st epoch but couldn't find the better one. For distilling, I got fid: 87.976 at epoch 270 with these options --nepochs 500 --nepochs_decay 500

So I added 2500 more unaligned pairs to the dataset and am training again now. I am eager to make this filter look great. Any advisory is appreciated.

Also, one more question is, is it right to resume with the latest_net, not the best_net I have? With the same learning rate options, I guess resuming from the best net, or 180_net again, is just repeating what I've trained from the last training. Please let me know if my assumption is correct.

opened by youjinChung 6

Error when distilling CycleGan

Hi again, I am now trying to use CAT to train and distill a CycleGan transformation. The training worked well but I am about to start distilling with:

 !python distill.py --dataroot /local_path/ \
  --dataset_mode unaligned \
  --distiller inception \
  --gan_mode lsgan \
  --log_dir /local_path/student \
  --restore_pretrained_G_path /local_path/teacher/checkpoints/best_A_net_G_A.pth \
  --restore_teacher_G_path /local_path/teacher/checkpoints/best_A_net_G_A.pth \
  --restore_D_path /local_path/teacher/checkpoints/best_A_net_D_A.pth \
  --real_stat_path /local_path/out_stat_B.npz\
  --nepochs 500 --nepochs_decay 500 \
  --save_latest_freq 25000 --save_epoch_freq 25 \
  --teacher_netG inception_9blocks --student_netG inception_9blocks \
  --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 24 \
  --eval_batch_size 2 \
  --gpu_ids 0 \
  --norm batch \
  --norm_affine \
  --norm_affine_D \
  --norm_track_running_stats \
  --channels_reduction_factor 6 \
  --kernel_sizes 1 3 5 \
  --lambda_distill 1.0 \
  --prune_cin_lb 16 \
  --target_flops 6.6e9 \
  --distill_G_loss_type ka

But I keep getting a runtime Error( similar to the one fixed on Issue #11 -> https://github.com/snap-research/CAT/issues/11)

Load network at /local_path/teacher/checkpoints/best_A_net_G_A.pth
Traceback (most recent call last):
  File "distill.py", line 13, in <module>
    trainer = Trainer('distill')
  File "/content/drive/MyDrive/CAT/trainer.py", line 80, in __init__
    model.setup(opt)
  File "/content/drive/MyDrive/CAT/distillers/base_inception_distiller.py", line 260, in setup
    self.load_networks(verbose)
  File "/content/drive/MyDrive/CAT/distillers/inception_distiller.py", line 192, in load_networks
    super(InceptionDistiller, self).load_networks(prune_continue)
  File "/content/drive/MyDrive/CAT/distillers/base_inception_distiller.py", line 362, in load_networks
    verbose)
  File "/content/drive/MyDrive/CAT/utils/util.py", line 139, in load_network
    net.load_state_dict(weights)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for InceptionGenerator:
	Missing key(s) in state_dict: "down_sampling.2.running_mean", "down_sampling.2.running_var",...

Lots of other missing keys.

Please let me know if there is something wrong or missing in my command line, (BTW the teacher training produced 4 Generator networks, which one should I use to distill the A to B generator? )

opened by jvillegassmule 6

Module Error in pix2pix distillation
I have built a pix2pix model, fully functional using custom code. I downloaded this pre-trained one I made but, I am not able to compress it due to the following error.

RuntimeError: Error(s) in loading state_dict for InceptionGenerator: Missing key(s) in state_dict: "down_sampling.1.weight", "down_sampling.2.weight", "down_sampling.2.bias", ..... Unexpected key(s) in state_dict: "final_conv.weight", "final_conv.bias", "encoders.0.conv.weight","encoders.1.bn.bias", "encoders.1.bn.running_mean",......

Just wanted to know if the distillation code works only for the architecture trained using the code in models/pix2pix_model.py and trainer.py. The code I used to train the pix2pix GAN is here.
opened by Ashish-Abraham 4

Errors when trying to resume distilling.

I have the image pairs for training ready on the /local_path dir. After successfully train the teacher with the command line:

!python train.py --dataroot /local_path \
  --model pix2pix \
  --log_dir /local_path/logs/teacher \
  --netG inception_9blocks \
  --lambda_recon 10 \
  --nepochs 500 --nepochs_decay 1000 \
  --norm batch \
  --norm_affine \
  --norm_affine_D \
  --norm_track_running_stats \
  --channels_reduction_factor 6 \
  --preprocess none \
  --kernel_sizes 1 3 5 \
  --save_epoch_freq 50 --save_latest_freq 20000 \
  --direction AtoB \
  --real_stat_path /local_path/out_stat.npz

I got a foder full of model checkpoints on /local_path/checkpoints. I was able to resume the train session with something like:

!python train.py --dataroot /local_path \
  --model pix2pix \
  --log_dir /local_path/logs/teacher \
  --netG inception_9blocks \
  --lambda_recon 10 \
  --nepochs 0 --nepochs_decay 750 \
  --norm batch \
  --norm_affine \
  --norm_affine_D \
  --norm_track_running_stats \
  --channels_reduction_factor 6 \
  --preprocess none \
  --kernel_sizes 1 3 5 \
  --save_epoch_freq 50 --save_latest_freq 20000 \
  --direction AtoB \
  --real_stat_path /local_path/out_stat.npz \
  --epoch_base 750 \
  --iter_base 300001 \
  --restore_G_path /local_path/logs/teacher/checkpoints/latest_net_G.pth \
  --restore_D_path /local_path/logs/teacher/checkpoints/latest_net_D.pth

After training, the results on the eval/(it_number)/fake folder are acceptable.

Then, I was able to run the distiller with the command:

 !python distill.py --dataroot /local_path \
  --distiller inception \
  --log_dir /local_path/logs/student \
  --restore_teacher_G_path /local_path/logs/teacher/checkpoints/best_net_G.pth \
  --restore_pretrained_G_path /local_path/logs/teacher/checkpoints/best_net_G.pth \
  --restore_D_path /local_path/logs/teacher/checkpoints/best_net_D.pth \
  --real_stat_path /local_path/out_stat.npz \
  --nepochs 500 --nepochs_decay 750 \
  --save_latest_freq 25000 --save_epoch_freq 25 \
  --teacher_netG inception_9blocks --student_netG inception_9blocks \
  --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 24 \
  --eval_batch_size 2 \
  --gpu_ids 0 \
  --norm batch \
  --norm_affine \
  --norm_affine_D \
  --norm_track_running_stats \
  --channels_reduction_factor 6 \
  --kernel_sizes 1 3 5 \
  --direction AtoB \
  --lambda_distill 2.0 \
  --prune_cin_lb 16 \
  --target_flops 2.6e9 \
  --distill_G_loss_type ka

I had to stop the session before it finished, again different checkpoint models were saved on the folder /local_path/logs/student/checkpoints. Including pth files for G,D, optim-0,optim-1,A-0,A1,A2 and A3 Progress seems OK on the local_path/logs/student/eval folder

I tried to resume distilling with the command line:

!python distill.py --dataroot /local_path \
  --distiller inception \
  --log_dir /local_path/logs/student \
  --restore_teacher_G_path /local_path/logs/teacher/checkpoints/best_net_G.pth \
  --restore_pretrained_G_path /local_path/logs/student/checkpoints/latest_net_G.pth \
  --restore_D_path /local_path/logs/student/checkpoints/latest_net_D.pth \
  --restore_student_G_path /local_path/logs/student/checkpoints/latest_net_G.pth\
  --pretrained_student_G_path /local_path/logs/student/checkpoints/latest_net_G.pth\
  --restore_A_path /local_path/logs/student/checkpoints/latest_net_A \
  --restore_O_path /local_path/logs/student/checkpoints/latest_optim \
  --real_stat_path /local_path/out_stat.npz \
  --nepochs 0 --nepochs_decay 325 \
  --save_latest_freq 25000 --save_epoch_freq 25 \
  --teacher_netG inception_9blocks --student_netG inception_9blocks \
  --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 24 \
  --eval_batch_size 2 \
  --gpu_ids 0 \
  --norm batch \
  --norm_affine \
  --norm_affine_D \
  --norm_track_running_stats \
  --channels_reduction_factor 6 \
  --kernel_sizes 1 3 5 \
  --direction AtoB \
  --lambda_distill 2.0 \
  --prune_cin_lb 16 \
  --target_flops 2.6e9 \
  --distill_G_loss_type ka \
  --epoch_base 925 \
  --iter_base 370000

But now I get this error:

Load network at /local_path/logs/student/checkpoints/latest_net_G.pth
Traceback (most recent call last):
  File "distill.py", line 13, in <module>
    trainer = Trainer('distill')
  File "/content/CAT/trainer.py", line 80, in __init__
    model.setup(opt)
  File "/content/CAT/distillers/base_inception_distiller.py", line 260, in setup
    self.load_networks(verbose)
  File "/content/CAT/distillers/inception_distiller.py", line 203, in load_networks
    super(InceptionDistiller, self).load_networks()
  File "/content/CAT/distillers/base_inception_distiller.py", line 368, in load_networks
    self.opt.restore_student_G_path, verbose)
  File "/content/CAT/utils/util.py", line 139, in load_network
    net.load_state_dict(weights)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for InceptionGenerator:
	Missing key(s) in state_dict: "down_sampling.1.bias", "down_sampling.2.weight", "down_sampling.2.bias", "down_sampling.2.running_mean", "down_sampling.2.running_var", "down_sampling.2.num_batches_tracked", "down_sampling.4.bias", "down_sampling.5.weight", "down_sampling.5.bias", "down_sampling.5.running_mean", "down_sampling.5.running_var", "down_sampling.5.num_batches_tracked", "down_sampling.7.bias"

... lots of other missing layers, then

	size mismatch for down_sampling.1.weight: copying a param with shape torch.Size([16, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([24, 3, 7, 7]).
	size mismatch for down_sampling.4.weight: copying a param with shape torch.Size([16, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([48, 24, 3, 3]).
	size mismatch for down_sampling.7.weight: copying a param with shape torch.Size([210, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 48, 3, 3]).

... lots of other size mismatches.

Seems to me that there is a mismatch between the network that was created internally and the one that is being used to fill it with the previously trained model. Not sure if it is a bug or if something is wrong in the command line I am using to resume.

Any help will be appreciated.

opened by jvillegassmule 3

distilling freezes with large dimension

Hello,

Since my last CycleGAN model is too blurred, I brought a new dataset with 1024x1024. Now I am testing the pipeline and found that distilling freezes with my specified dimension.

  --preprocess resize_and_crop \
  --load_size 1024 \
  --crop_size 1024,1024 \

It didn't work with none preprocess as well.

 --preprocess none

this is the last log that I can see. distilling doesn't proceed after this point.

...
features.8.res_ops.0.1.1.weight
features.8.res_ops.1.1.1.weight
features.8.res_ops.2.1.1.weight
features.8.dw_ops.0.0.1.weight
features.8.dw_ops.1.0.1.weight
features.8.dw_ops.2.0.1.weight
scale range: [0.9916747808456421, 1.014320731163025]

I've tested with some other options, like

without those options(with default options)
with the size of 500

  --preprocess resize_and_crop \
  --load_size 500 \
  --crop_size 500,500 \

and distilling worked with those cases.

I wonder if it froze because it is too large or something I misunderstood with the sizing. Also, I saw the options used 256x256 size as default, and other tutorial scripts used the same sizing, I wonder if I still need to go with this sizing with high-resolution face-changing filters. I am training face filters that make people smile.

Just in case, this is the whole option I used for distill.py

!python distill.py --dataroot database/face2smile \
  --dataset_mode unaligned \
  --distiller inception \
  --gan_mode lsgan \
  --log_dir logs/cycle_gan/face2smile/inception/student/2p6B \
  --restore_teacher_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_G_A.pth \
  --restore_pretrained_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_G_A.pth \
  --restore_D_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_D_A.pth \
  --real_stat_path real_stat/face2smile_B.npz \
  --teacher_netG inception_9blocks --student_netG inception_9blocks \
  --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
  --ndf 64 \
  --num_threads 80 \
  --eval_batch_size 4 \
  --batch_size 80 \
  --gpu_ids 0,1,2,3 \
  --norm_affine \
  --norm_affine_D \
  --channels_reduction_factor 6 \
  --kernel_sizes 1 3 5 \
  --lambda_distill 1.0 \
  --lambda_recon 5 \
  --prune_cin_lb 16 \
  --target_flops 2.6e9 \
  --distill_G_loss_type ka \
  --save_epoch_freq 1 \
  --save_latest_freq 500 \
  --norm_student batch \
  --padding_type_student zero \
  --norm_affine_student \
  --norm_track_running_stats_student \
  --preprocess resize_and_crop \
  --load_size 1024\
  --crop_size 500,500 \
  --nepochs 1 --nepochs_decay 0

opened by youjinChung 2

restore options and recommaned epoch number
Hello,

Thanks to your help, I am training a cycleGAN with face images for a snap lens. I found some restore options from the tutorial Jupiter notebook,

--restore_teacher_G_path ukiyo_teacher_iter68000_net_G_B.pth \ --restore_pretrained_G_path ukiyo_teacher_iter68000_net_G_B.pth \ --restore_D_path ukiyo_teacher_iter68000_net_D_B.pth \

but when I tried to find them in the options folder, I could only find --restore_O_path option from train_options.py so my questions are

right options for resuming training/ distilling

--epoch_base XXX \ # last epoch + 1? --iter_base XXX \ # last iteration + 1? --nepochs XXX \ # total goal epochs? --nepochs_decay XXX \ #total goal epochs with lr decay? #and are these options below only for distilling? --restore_teacher_G_path xxx.pth \ --restore_pretrained_G_path xxx.pth \ --restore_D_path xxx.pth

nepochs / nepochs_decay recommendation When I check the scripts folder, I see --nepochs 500 --nepochs_decay 500. I am training with 4GPUs but still takes quite a bit. I wonder how many epochs are recommended for desirable results.
opened by youjinChung 2
get_real_stat.py for custom dataset
I am working on a pix2pix model and ran this code to get real_stats.npz file with the folder structure specified in get_real_stat.py

!python get_real_stat.py \ --dataroot database/sketch/trainA \ --dataset_mode single \ --output_path real_stat/sketch_stats_train_A.npz \ --direction BtoA \ --phase train

!python get_real_stat.py \ --dataroot database/sketch/trainB \ --dataset_mode single \ --output_path real_stat/sketch_stats_train_B.npz \ --direction BtoA \ --phase train

I am facing the error Traceback (most recent call last): File "get_real_stat.py", line 146, in <module> main(opt) File "get_real_stat.py", line 40, in main tensors = torch.cat(tensors, dim=0) RuntimeError: Sizes of tensors must match except in dimension 0. Got 256 and 560 in dimension 2 (The offending index is 909)
Should I edit the dataloader for this or is it anything else ? Any help is appreciated, please.
opened by Ashish-Abraham 1

custom cyclegan training epoch and data preparation

Hello,

I trained a CycleGAN with CelebA dataset. My goal is to make not smiling faces smile. I checked this dataset with another cycleGAN repo, and it worked pretty well with 300 epochs. So I trained with CAT cycleGAN --nepochs 160 --nepochs_decay 160 for training and distilling. I read that other scripts such as horse2zebra has --nepochs 500 --nepochs_decay 500, so I guess my training epochs were too small, but I'd like to ask your opinion first.

So I can only see a little sign that the CycleGAN blurred the male's mouth. and another concern is the filter is too blurred, seems like low resolution. so my questions are

Do you think that is due to the small number of epochs I trained?

Let me share the commands I used for training. Or I wonder if my get _real_stat command is wrong

!python get_real_stat.py --dataroot database/face2smile/valB --output_path real_stat/face2smile_B.npz --direction AtoB --dataset_mode single
!python get_real_stat.py --dataroot database/face2smile/valA --output_path real_stat/face2smile_A.npz --direction BtoA --dataset_mode single

!python train.py --dataroot database/face2smile \
  --model cycle_gan \
  --log_dir logs/cycle_gan/face2smile/inception/teacher \
  --netG inception_9blocks \
  --real_stat_A_path real_stat/face2smile_A.npz \
  --real_stat_B_path real_stat/face2smile_B.npz \
  --batch_size 16 \
  --restore_G_A_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/140_net_G_A.pth\
  --restore_D_A_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/140_net_D_A.pth\
  --restore_G_B_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/140_net_G_B.pth\
  --restore_D_B_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/140_net_D_B.pth\
  --epoch_base 141 \
  --nepochs 10 --nepochs_decay 140 \
  --num_threads 16 \
  --gpu_ids 0,1,2,3 \
  --norm_affine \
  --norm_affine_D \
  --channels_reduction_factor 6 \
  --kernel_sizes 1 3 5

!python distill.py --dataroot database/face2smile \
  --log_dir logs/cycle_gan/face2smile/inception/student/2p6B \
  --restore_teacher_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_G_A.pth \
  --restore_pretrained_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_G_A.pth \
  --restore_D_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_D_A.pth \
  --real_stat_path real_stat/face2smile_B.npz \
  --dataset_mode unaligned \
  --distiller inception \
  --gan_mode lsgan \
  --nepochs 160 \
  --nepochs_decay 160 \
  --teacher_netG inception_9blocks --student_netG inception_9blocks \
  --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
  --ndf 64 \
  --num_threads 2 \
  --eval_batch_size 2 \
  --batch_size 4 \
  --gpu_ids 0 \
  --norm_affine \
  --norm_affine_D \
  --channels_reduction_factor 6 \
  --kernel_sizes 1 3 5 \
  --lambda_distill 1.0 \
  --lambda_recon 5 \
  --prune_cin_lb 16 \
  --target_flops 2.6e9 \
  --distill_G_loss_type ka \
  --crop_size 512,256 \
  --preprocess resize_and_crop \
  --load_size 600 \
  --save_epoch_freq 1 \
  --save_latest_freq 500 \
  --direction BtoA \
  --norm_student batch \
  --padding_type_student zero \
  --norm_affine_student \
  --norm_track_running_stats_student

so now I am trying to train more like below, do you think I could get working results with these?

 --epoch_base 261 \ #best_net_epoch
 --nepochs 240 --nepochs_decay 500 \

About the blurring.

My dataset dimension is 178x218 and centered with head. 000671 and I see the train model input dimension is 256x256. Do I need to do some data preparation for that? like upscaling training data. I saw the distilling options for horse2zebra

  --crop_size 512,256 \
  --preprocess resize_and_crop \
  --load_size 600 \

in my case, I probably want to get face crop texture only, so I am thinking of using default options like the below.

  --crop_size 286 \
  --preprocess resize_and_crop \
  --load_size 256, 256 \

Do you think this is the right approach?

Sorry for so many questions, but I wanted to make my model work and just listed all the possible questions that might be related. Please let me know if you have any opinion on how to make the model work better.

Best, Youjin

opened by youjinChung 1

I don't see `--teacher_netG ` or `--student_netG` option in distill_options.

Hello, I was running img-to-img translation notebook example and when I tried to train, I got this error below.

--teacher_netG inception_9blocks --student_netG inception_9blocks \
                                   ^
SyntaxError: invalid syntax

and this is the training block training block

NUM_EPOCHS = 50 

!python distill.py --dataroot ./database/ukiyoe2photo \
--log_dir logs/cycle_gan/ukiyoe2photo/inception/student/2p6B \
--restore_teacher_G_path ukiyo_teacher_iter68000_net_G_B.pth \
--restore_pretrained_G_path ukiyo_teacher_iter68000_net_G_B.pth \
--restore_D_path ukiyo_teacher_iter68000_net_D_B.pth \
--real_stat_path ukiyo_A.npz \
--dataset_mode unaligned \
--distiller inception \
--gan_mode lsgan \
--nepochs NUM_EPOCHS --nepochs_decay NUM_EPOCHS \ 
--netG inception_9blocks \
--pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
--ndf 64 \
--num_threads 2 \
--eval_batch_size 2 \
--batch_size 10 \
--gpu_ids 0 \
--norm_affine \
--norm_affine_D \
--channels_reduction_factor 6 \
--kernel_sizes 1 3 5 \
--lambda_distill 2.8 \
--lambda_recon 1000 \
--prune_cin_lb 16 \
--target_flops 2.6e9 \
--distill_G_loss_type ka \
--netD multi_scale \
--crop_size 512,256 \
--preprocess resize_and_crop \
--load_size 600 \
--save_epoch_freq 1 \
--save_latest_freq 500 \
--direction BtoA \
--norm_student batch \
--padding_type_student zero \
--norm_affine_student \
--norm_track_running_stats_student \

I fixed it to --netG inception_9blocks, but it still shows the same error. I checked script files in the repo and all the options are still --teacher_netG. Could you let me know how to make it work?

opened by youjinChung 1

AssertionError: Invalid device id

Hello, I encountered errors in model training. What should I do? Do you have any suggestions? Thank you very much！ “AssertionError: Invalid device id”

opened by XUEXI-CL 1

Distilling doesn't work as expected.

Hello,

Since the last question, #24, I tried 512x512 resolution training for both teacher and student models. I found that the teacher model in 512x512 works fine, but student training is not working. I wonder if I can get some hints why

Tfake img Sfake img (274/1000 epoch)

training options

!python train.py --dataroot database/face2smile \
  --model cycle_gan \
  --log_dir logs/cycle_gan/face2smile/teacher_512 \
  --netG inception_9blocks \
  --real_stat_A_path real_stat_512/face2smile_A.npz \
  --real_stat_B_path real_stat_512/face2smile_B.npz \
  --batch_size 4 \
  --num_threads 32 \
  --gpu_ids 0,1,2,3 \
  --norm_affine \
  --norm_affine_D \
  --channels_reduction_factor 6 \
  --kernel_sizes 1 3 5 \
  --save_latest_freq 10000 --save_epoch_freq 5 \
  --epoch_base 176 --iter_base 223395 \
  --nepochs 324 --nepochs_decay 500 \
  --preprocess scale_width --load_size 512 \

!python distill.py --dataroot database/face2smile \
  --dataset_mode unaligned \
  --distiller inception \
  --gan_mode lsgan \
  --log_dir logs/cycle_gan/face2smile/student_512 \
  --restore_teacher_G_path logs/cycle_gan/face2smile/teacher_512/checkpoints/170_net_G_A.pth \
  --restore_pretrained_G_path logs/cycle_gan/face2smile/teacher_512/checkpoints/170_net_G_A.pth \
  --restore_D_path logs/cycle_gan/face2smile/teacher_512/checkpoints/170_net_D_A.pth \
  --real_stat_path real_stat_512/face2smile_B.npz \
  --teacher_netG inception_9blocks --student_netG inception_9blocks \
  --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
  --ndf 64 \
  --num_threads 32 \
  --eval_batch_size 4 \
  --batch_size 32 \
  --gpu_ids 0,1,2,3 \
  --norm_affine \
  --norm_affine_D \
  --channels_reduction_factor 6 \
  --kernel_sizes 1 3 5 \
  --lambda_distill 1.0 \
  --lambda_recon 5 \
  --prune_cin_lb 16 \
  --target_flops 2.6e9 \
  --distill_G_loss_type ka \
  --preprocess scale_width --load_size 512 \
  --save_epoch_freq 2 --save_latest_freq 1000 \
  --nepochs 500 --nepochs_decay 500 \
  --norm_student batch \
  --padding_type_student zero \
  --norm_affine_student \
  --norm_track_running_stats_student

opened by youjinChung 12

CUDA out of memory

Update: I wonder if repo update is possible with `torch.amp` or mixed precision applied.

Hello,

I'd like to ask, with my dataset and machine, if it is normal to see out of memory or if I might have some programming issue.

RuntimeError: CUDA out of memory. Tried to allocate 260.00 MiB (GPU 0; 15.78 GiB total capacity; 14.02 GiB already allocated; 198.19 MiB free; 14.13 GiB reserved in total by PyTorch)

I using AWS p3.8xlarge(4 Tesla V100s), trying to train a CycleGAN with 5055 images of 1024x1024 resolution.

I checked that resized dataset of 512x512 works with batch size 4, but with 1024x1024, even batch size 1 doesn't work.

I think we need p4d.24xlarge for this project, but it's hard to get the instance due to the lack of zone capacity.

possible tries are: -reduce num of the dataset (but I think 5055 images are still small for training) My colleague thinks the model loads the whole dataset at once, and that's the reason we need to reduce the dataset. -find a memory leak?

any comments or hints are appreciated.

below is the log for reference.

train.py --dataroot database/face2smile \
>   --model cycle_gan \
>   --log_dir logs/cycle_gan/face2smile/teacher_1080 \
>   --netG inception_9blocks \
>   --real_stat_A_path real_stat_1080/face2smile_A.npz \
>   --real_stat_B_path real_stat_1080/face2smile_B.npz \
>   --batch_size 1 \
>   --num_threads 1 \
>   --gpu_ids 0,1,2,3 \
>   --norm_affine \
>   --norm_affine_D \
>   --channels_reduction_factor 6 \
>   --kernel_sizes 1 3 5 \
>   --save_latest_freq 10000 --save_epoch_freq 5 \
>   --nepochs 1 --nepochs_decay 0 \
>   --preprocess none
----------------- Options ---------------
                active_fn: nn.ReLU                       
              active_fn_D: nn.LeakyReLU                  
             aspect_ratio: 1.0                           
               batch_size: 4                             	[default: 1]
                    beta1: 0.5                           
                 channels: None                          
channels_reduction_factor: 6                             	[default: 1]
          cityscapes_path: database/cityscapes-origin    
                crop_size: 256, 256                      
                 dataroot: database/face2smile           	[default: None]
             dataset_mode: unaligned                     
                direction: AtoB                          
          display_winsize: 256                           
                 drn_path: drn-d-105_ms_cityscapes.pth   
             dropout_rate: 0                             
               epoch_base: 1                             
          eval_batch_size: 1                             
                 gan_mode: lsgan                         
                  gpu_ids: 0,1,2,3                       	[default: 0]
                init_gain: 0.02                          
                init_type: normal                        
                 input_nc: 3                             
                  isTrain: True                          	[default: None]
                iter_base: 1                             
             kernel_sizes: [1, 3, 5]                     	[default: [3, 5, 7]]
                 lambda_A: 10.0                          
                 lambda_B: 10.0                          
          lambda_identity: 0.5                           
           load_in_memory: False                         
                load_size: 286                           
                  log_dir: logs/cycle_gan/face2smile/teacher_1080	[default: logs]
                       lr: 0.0002                        
           lr_decay_iters: 50                            
                lr_policy: linear                        
         max_dataset_size: -1                            
                    model: cycle_gan                     	[default: pix2pix]
     moving_average_decay: 0.0                           
moving_average_decay_adjust: False                         
moving_average_decay_base_batch: 32                            
               n_layers_D: 3                             
                      ndf: 64                            
                  nepochs: 1                             	[default: 100]
            nepochs_decay: 0                             	[default: 100]
                     netD: n_layers                      
                     netG: inception_9blocks             
                      ngf: 64                            
                  no_flip: False                         
                     norm: instance                      
              norm_affine: True                          	[default: False]
            norm_affine_D: True                          	[default: False]
             norm_epsilon: 1e-05                         
            norm_momentum: 0.1                           
             norm_student: instance                      
 norm_track_running_stats: False                         
              num_threads: 32                            	[default: 4]
                output_nc: 3                             
             padding_type: reflect                       
                    phase: train                         
                pool_size: 50                            
               preprocess: none                          	[default: resize_and_crop]
               print_freq: 100                           
         real_stat_A_path: real_stat_1080/face2smile_A.npz	[default: None]
         real_stat_B_path: real_stat_1080/face2smile_B.npz	[default: None]
         restore_D_A_path: None                          
         restore_D_B_path: None                          
         restore_G_A_path: None                          
         restore_G_B_path: None                          
           restore_O_path: None                          
          save_epoch_freq: 5                             	[default: 20]
         save_latest_freq: 10000                         	[default: 20000]
                     seed: 233                           
           serial_batches: False                         
               table_path: datasets/table.txt            
          tensorboard_dir: None                          
----------------- End -------------------
train.py --dataroot database/face2smile --model cycle_gan --log_dir logs/cycle_gan/face2smile/teacher_1080 --netG inception_9blocks --real_stat_A_path real_stat_1080/face2smile_A.npz --real_stat_B_path real_stat_1080/face2smile_B.npz --batch_size 4 --num_threads 32 --gpu_ids 0,1,2,3 --norm_affine --norm_affine_D --channels_reduction_factor 6 --kernel_sizes 1 3 5 --save_latest_freq 10000 --save_epoch_freq 5 --nepochs 1 --nepochs_decay 0 --preprocess none
dataset [UnalignedDataset] was created
The number of training images = 5055
data shape is: channel=3, height=1024, width=1024.
initialize network with normal
initialize network with normal
initialize network with normal
initialize network with normal
dataset [SingleDataset] was created
dataset [SingleDataset] was created
/home/ubuntu/.local/lib/python3.9/site-packages/torchvision/models/inception.py:80: FutureWarning: The default weight initialization of inception_v3 will be changed in future releases of torchvision. If you wish to keep the old behavior (which leads to long initialization times due to scipy/scipy#11299), please set init_weights=True.
  warnings.warn('The default weight initialization of inception_v3 will be changed in future releases of '
model [CycleGANModel] was created
---------- Networks initialized -------------
DataParallel(
  (module): InceptionGenerator(
    (down_sampling): Sequential(
      (0): ReflectionPad2d((3, 3, 3, 3))
      (1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1))
      (2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (3): ReLU(inplace=True)
      (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (6): ReLU(inplace=True)
      (7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (9): ReLU(inplace=True)
    )
    (features): Sequential(
      (0): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (1): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (2): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (3): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (4): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (5): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (6): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (7): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (8): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
    )
    (up_sampling): Sequential(
      (0): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
      (1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (2): ReLU(inplace=True)
      (3): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
      (4): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (5): ReLU(inplace=True)
      (6): ReflectionPad2d((3, 3, 3, 3))
      (7): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1))
      (8): Tanh()
    )
  )
)
[Network G_A] Total number of parameters : 8.154 M
DataParallel(
  (module): InceptionGenerator(
    (down_sampling): Sequential(
      (0): ReflectionPad2d((3, 3, 3, 3))
      (1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1))
      (2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (3): ReLU(inplace=True)
      (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (6): ReLU(inplace=True)
      (7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (9): ReLU(inplace=True)
    )
    (features): Sequential(
      (0): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (1): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (2): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (3): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (4): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (5): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (6): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (7): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
      (8): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
    )
    (up_sampling): Sequential(
      (0): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
      (1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (2): ReLU(inplace=True)
      (3): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
      (4): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (5): ReLU(inplace=True)
      (6): ReflectionPad2d((3, 3, 3, 3))
      (7): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1))
      (8): Tanh()
    )
  )
)
[Network G_B] Total number of parameters : 8.154 M
DataParallel(
  (module): NLayerDiscriminator(
    (model): Sequential(
      (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
      (1): LeakyReLU(negative_slope=0.2, inplace=True)
      (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
      (3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (4): LeakyReLU(negative_slope=0.2, inplace=True)
      (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
      (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (7): LeakyReLU(negative_slope=0.2, inplace=True)
      (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
      (9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (10): LeakyReLU(negative_slope=0.2, inplace=True)
      (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
    )
  )
)
[Network D_A] Total number of parameters : 2.767 M
DataParallel(
  (module): NLayerDiscriminator(
    (model): Sequential(
      (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
      (1): LeakyReLU(negative_slope=0.2, inplace=True)
      (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
      (3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (4): LeakyReLU(negative_slope=0.2, inplace=True)
      (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
      (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (7): LeakyReLU(negative_slope=0.2, inplace=True)
      (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
      (9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (10): LeakyReLU(negative_slope=0.2, inplace=True)
      (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
    )
  )
)
[Network D_B] Total number of parameters : 2.767 M
-----------------------------------------------
start_epoch: 1
end_epoch: 1
total_iter: 1
current memory allocated: 265.4296875
max memory allocated: 265.4296875
cached memory: 276.0
will set input data
Traceback (most recent call last):
  File "/data/CAT/train.py", line 14, in <module>
    trainer.start()
  File "/data/CAT/trainer.py", line 159, in start
    model.optimize_parameters(total_iter)
  File "/data/CAT/models/cycle_gan_model.py", line 295, in optimize_parameters
    self.forward()
  File "/data/CAT/models/cycle_gan_model.py", line 235, in forward
    self.rec_A = self.netG_B(self.fake_B)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/CAT/models/modules/inception_architecture/inception_generator.py", line 141, in forward
    res = self.up_sampling(res)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/padding.py", line 173, in forward
    return F.pad(input, self.padding, 'reflect')
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 4014, in _pad
    return torch._C._nn.reflection_pad2d(input, pad)
RuntimeError: CUDA out of memory. Tried to allocate 260.00 MiB (GPU 0; 15.78 GiB total capacity; 14.02 GiB already allocated; 198.19 MiB free; 14.13 GiB reserved in total by PyTorch)

opened by youjinChung 0

Key Error in training pix2pix with unaligned data

I have been trying to train a pix2pix model on ukiyoe2photo dataset as given in this notebook by SnapML. The create_eval_dataloader function is giving a key error with the following log.

Traceback (most recent call last): File "train.py", line 14, in trainer.start() File "/content/CAT/trainer.py", line 328, in start (epoch, total_iter)) File "/content/CAT/trainer.py", line 272, in evaluate metrics = self.model.evaluate_model(iter) File "/content/CAT/models/pix2pix_model.py", line 507, in evaluate_model self.set_input(data_i) File "/content/CAT/models/pix2pix_model.py", line 439, in set_input self.real_A = input['A' if AtoB else 'B'].to(self.device) KeyError: 'B'

Seems like the data produced by the eval_dataloader contains only 2 keys: 'A' and 'A_paths'. Apparently it has no 'B' and 'B_paths' field. It would be great if you may help me with it.

opened by Ashish-Abraham 0

`restore` options to resume `distill.py`

Hello,

I got this error for onnx exporting.

Traceback (most recent call last):
  File "/home/ubuntu/CAT/onnx_export.py", line 13, in <module>
    exporter = Exporter()
  File "/home/ubuntu/CAT/onnx_exporter.py", line 59, in __init__
    model.netG_student.load_state_dict(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for InceptionGenerator:
	size mismatch for down_sampling.7.weight: copying a param with shape torch.Size([234, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([230, 16, 3, 3]).
	size mismatch for down_sampling.7.bias: copying a param with shape torch.Size([234]) from checkpoint, the shape in current model is torch.Size([230]).
...
and so many mismatches

It is so weird since I checked it worked before and actually exported models. Let me share my commands for distilling and exporting.

!python distill.py --dataroot database/face2smile \
  --dataset_mode unaligned \
  --distiller inception \
  --gan_mode lsgan \
  --log_dir logs/cycle_gan/face2smile/inception/student/2p6B \
  --restore_teacher_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_B_net_G_A.pth \
  --restore_pretrained_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_B_net_G_A.pth \
  --restore_D_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_B_net_D_A.pth \
  --real_stat_path real_stat/face2smile_B.npz \
  --nepochs 500 --nepochs_decay 500 \
  --teacher_netG inception_9blocks --student_netG inception_9blocks \
  --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
  --ndf 64 \
  --num_threads 80 \
  --eval_batch_size 4 \
  --batch_size 80 \
  --gpu_ids 0,1,2,3 \
  --norm_affine \
  --norm_affine_D \
  --channels_reduction_factor 6 \
  --kernel_sizes 1 3 5 \
  --lambda_distill 1.0 \
  --lambda_recon 5 \
  --prune_cin_lb 16 \
  --target_flops 2.6e9 \
  --distill_G_loss_type ka \
  --save_epoch_freq 1 \
  --save_latest_freq 500 \
  --norm_student batch \
  --padding_type_student zero \
  --norm_affine_student \
  --norm_track_running_stats_student

!python3 onnx_export.py --dataroot database/face2smile \
  --log_dir onnx_files/cycle_gan/face2smile/inception/student/2p6B \
  --restore_teacher_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_G_A.pth \
  --pretrained_student_G_path logs/cycle_gan/face2smile/inception/student/2p6B/checkpoints/best_net_G.pth \
  --real_stat_path real_stat/face2smile_B.npz \
   --dataset_mode unaligned \
  --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
  --gpu_ids 0 \
  --norm_affine \
  --channels_reduction_factor 6 \
  --kernel_sizes 1 3 5 \
  --prune_cin_lb 16 \
  --target_flops 2.6e9 \
  --ndf 64 \
  --batch_size 8 \
  --eval_batch_size 2 \
  --num_threads 8 \
  --norm_affine_D \
  --teacher_netG inception_9blocks --student_netG inception_9blocks \
  --distiller inception \
  --gan_mode lsgan \
  --norm_student batch \
  --padding_type_student zero \
  --norm_affine_student \
  --norm_track_running_stats_student

opened by youjinChung 7

[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)

Related tags

Overview

CAT

Overview

Prerequisites

Getting Started

Installation

Data Preparation

CycleGAN

Setup

Pix2pix

Setup

Cityscapes Dataset

Evaluation Preparation

mIoU Computation

FID/KID Computation

Model Training

Teacher Training

Student Training

Pre-trained Models

Model Evaluation

Model Export

Citation

Acknowledgements

Comments

Do you think that is due to the small number of epochs I trained?

About the blurring.

Update: I wonder if repo update is possible with torch.amp or mixed precision applied.

Owner

Snap Research

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

[cvpr22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

A tensorflow model that predicts if the image is of a cat or of a dog.

SIEM Logstash parsing for more than hundred technologies

Research shows Google collects 20x more data from Android than Apple collects from iOS. Block this non-consensual telemetry using pihole blocklists.

We are More than Our JOints: Predicting How 3D Bodies Move

An AI Assistant More Than a Toolkit

SuperSDR: multiplatform KiwiSDR + CAT transceiver integrator

Official implement of "CAT: Cross Attention in Vision Transformer".

use tensorflow 2.0 to tell a dog and cat from a specified picture

In this project we use both Resnet and Self-attention layer for cat, dog and flower classification.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

[ICLR 2021] Is Attention Better Than Matrix Decomposition?

[NeurIPS 2021] Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Implementation of "RaScaNet: Learning Tiny Models by Raster-Scanning Image" from CVPR 2021.

Update: I wonder if repo update is possible with `torch.amp` or mixed precision applied.