[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)

Overview

CAT

arXiv

Pytorch implementation of our method for compressing image-to-image models.
Teachers Do More Than Teach: Compressing Image-to-Image Models
Qing Jin1, Jian Ren2, Oliver J. Woodford, Jiazhuo Wang2, Geng Yuan1, Yanzhi Wang1, Sergey Tulyakov2
1Northeastern University, 2Snap Inc.
In CVPR 2021.

Overview

Compression And Teaching (CAT) framework for compressing image-to-image models: ① Given a pre-trained teacher generator Gt, we determine the architecture of a compressed student generator Gs by eliminating those channels with smallest magnitudes of batch norm scaling factors. ② We then distill knowledge from the pretrained teacher Gt on the student Gs via a novel distillation technique, which maximize the similarity between features of both generators, defined in terms of kernel alignment (KA).

Prerequisites

  • Linux
  • Python 3
  • CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

  • Clone this repo:

    git clone [email protected]:snap-research/CAT.git
    cd CAT
  • Install PyTorch 1.7 and other dependencies (e.g., torchvision).

    • For pip users, please type the command pip install -r requirements.txt.
    • For Conda users, please create a new Conda environment using conda env create -f environment.yml.

Data Preparation

CycleGAN

Setup

  • Download the CycleGAN dataset (e.g., horse2zebra).

    bash datasets/download_cyclegan_dataset.sh horse2zebra
  • Get the statistical information for the ground-truth images for your dataset to compute FID. We provide pre-prepared real statistic information for several datasets on Google Drive Folder.

Pix2pix

Setup

  • Download the pix2pix dataset (e.g., cityscapes).

    bash datasets/download_pix2pix_dataset.sh cityscapes

Cityscapes Dataset

For the Cityscapes dataset, we cannot provide it due to license issue. Please download the dataset from https://cityscapes-dataset.com and use the script prepare_cityscapes_dataset.py to preprocess it. You need to download gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip and unzip them in the same folder. For example, you may put gtFine and leftImg8bit in database/cityscapes-origin. You need to prepare the dataset with the following commands:

python datasets/get_trainIds.py database/cityscapes-origin/gtFine/
python datasets/prepare_cityscapes_dataset.py \
--gtFine_dir database/cityscapes-origin/gtFine \
--leftImg8bit_dir database/cityscapes-origin/leftImg8bit \
--output_dir database/cityscapes \
--table_path datasets/table.txt

You will get a preprocessed dataset in database/cityscapes and a mapping table (used to compute mIoU) in dataset/table.txt.

  • Get the statistical information for the ground-truth images for your dataset to compute FID. We provide pre-prepared real statistics for several datasets. For example,

    bash datasets/download_real_stat.sh cityscapes A

Evaluation Preparation

mIoU Computation

To support mIoU computation, you need to download a pre-trained DRN model drn-d-105_ms_cityscapes.pth from http://go.yf.io/drn-cityscapes-models. By default, we put the drn model in the root directory of our repo. Then you can test our compressed models on cityscapes after you have downloaded our compressed models.

FID/KID Computation

To compute the FID/KID score, you need to get some statistical information from the groud-truth images of your dataset. We provide a script get_real_stat.py to extract statistical information. For example, for the map2arial dataset, you could run the following command:

python get_real_stat.py \
--dataroot database/map2arial \
--output_path real_stat/maps_B.npz \
--direction AtoB

For paired image-to-image translation (pix2pix and GauGAN), we calculate the FID between generated test images to real test images. For unpaired image-to-image translation (CycleGAN), we calculate the FID between generated test images to real training+test images. This allows us to use more images for a stable FID evaluation, as done in previous unconditional GANs research. The difference of the two protocols is small. The FID of our compressed CycleGAN model increases by 4 when using real test images instead of real training+test images.

KID is not supported for the cityscapes dataset.

Model Training

Teacher Training

The first step of our framework is to train a teacher model. For this purpose, please run the script train_inception_teacher.sh under the correponding folder named as the dataset, for example, run

bash scripts/cycle_gan/horse2zebra/train_inception_teacher.sh

Student Training

With the pretrained teacher model, we can determine the architecture of student model under prescribed computational budget. For this purpose, please run the script train_inception_student_XXX.sh under the correponding folder named as the dataset, where XXX stands for the computational budget (in terms of FLOPs for this case) and can be different for different datasets and models. For example, for CycleGAN with Horse2Zebra dataset, our computational budget is 2.6B FLOPs, so we run

bash scripts/cycle_gan/horse2zebra/train_inception_student_2p6B.sh

Pre-trained Models

For convenience, we also provide pretrained teacher and student models on Google Drive Folder.

Model Evaluation

With pretrained teacher and student models, we can evaluate them on the dataset. For this purpose, please run the script evaluate_inception_student_XXX.sh under the corresponding folder named as the dataset, where XXX is the computational budget (in terms of FLOPs). For example, for CycleGAN with Horse2Zebra dataset where the computational budget is 2.6B FLOPs, please run

bash scripts/cycle_gan/horse2zebra/evaluate_inception_student_2p6B.sh

Model Export

The final step is to export the trained compressed model as onnx file to run on mobile devices. For this purpose, please run the script onnx_export_inception_student_XXX.sh under the corresponding folder named as the dataset, where XXX is the computational budget (in terms of FLOPs). For example, for CycleGAN with Horse2Zebra dataset where the computational budget is 2.6B FLOPs, please run

bash scripts/cycle_gan/horse2zebra/onnx_export_inception_student_2p6B.sh

This will create one .onnx file in addition to log files.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{jin2021teachers,
  title={Teachers Do More Than Teach: Compressing Image-to-Image Models},
  author={Jin, Qing and Ren, Jian and Woodford, Oliver J and Wang, Jiazhuo and Yuan, Geng and Wang, Yanzhi and Tulyakov, Sergey},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Acknowledgements

Our code is developed based on AtomNAS and gan-compression.

We also thank pytorch-fid for FID computation and drn for mIoU computation.

Comments
  • `get_real_stat.py` for custom dataset

    `get_real_stat.py` for custom dataset

    Hello,

    I am testing CycleGAN as the readme documents. It seems I need npz files for training, and when I see the command here

    !python get_real_stat.py \
    --dataroot database/horse2zebra \
    --output_path real_stat/horse2zebra_A.npz \
    --direction AtoB
    

    and when I see the flag --dataroot it requires trainA, trainB, valA, valB and plus train, val, etc folders .

        parser.add_argument(
            '--dataroot',
            required=True,
            help=
            'path to images (should have subfolders trainA, trainB, valA, valB, train, val, etc)'
        )
    

    I see CycleGAN requires usually 4 folders like trainA, trainB, testA(valA), testB(valB) but I am not sure what train, val, etc folders are for. Could you explain how to set those folders? or is there a way to train without .npz files?

    opened by youjinChung 10
  • training question

    training question

    Hello again,

    I am training CycleGAN with ffhq dataset, trying to make a filter that makes the user smile. I checked with my training recently, but the result wasn't satisfying. So, I'd like to ask for advice from you to enhance the performance of the model.

    Initially, I trained with 2500 unaligned pairs of smiling/not smiling 1024x1024 photos and got fid_B: 88.244 with --nepochs 500 --nepochs_decay 500 While training the teacher model, I found the best model is found on epoch 181 / 1000. I kinda kept training till the 841st epoch but couldn't find the better one. For distilling, I got fid: 87.976 at epoch 270 with these options --nepochs 500 --nepochs_decay 500

    Screen Shot 2022-09-13 at 4 33 05 PM Screen Shot 2022-09-13 at 4 33 14 PM

    So I added 2500 more unaligned pairs to the dataset and am training again now. I am eager to make this filter look great. Any advisory is appreciated.

    Also, one more question is, is it right to resume with the latest_net, not the best_net I have? With the same learning rate options, I guess resuming from the best net, or 180_net again, is just repeating what I've trained from the last training. Please let me know if my assumption is correct.

    opened by youjinChung 6
  • Error when distilling CycleGan

    Error when distilling CycleGan

    Hi again, I am now trying to use CAT to train and distill a CycleGan transformation. The training worked well but I am about to start distilling with:

     !python distill.py --dataroot /local_path/ \
      --dataset_mode unaligned \
      --distiller inception \
      --gan_mode lsgan \
      --log_dir /local_path/student \
      --restore_pretrained_G_path /local_path/teacher/checkpoints/best_A_net_G_A.pth \
      --restore_teacher_G_path /local_path/teacher/checkpoints/best_A_net_G_A.pth \
      --restore_D_path /local_path/teacher/checkpoints/best_A_net_D_A.pth \
      --real_stat_path /local_path/out_stat_B.npz\
      --nepochs 500 --nepochs_decay 500 \
      --save_latest_freq 25000 --save_epoch_freq 25 \
      --teacher_netG inception_9blocks --student_netG inception_9blocks \
      --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 24 \
      --eval_batch_size 2 \
      --gpu_ids 0 \
      --norm batch \
      --norm_affine \
      --norm_affine_D \
      --norm_track_running_stats \
      --channels_reduction_factor 6 \
      --kernel_sizes 1 3 5 \
      --lambda_distill 1.0 \
      --prune_cin_lb 16 \
      --target_flops 6.6e9 \
      --distill_G_loss_type ka
    

    But I keep getting a runtime Error( similar to the one fixed on Issue #11 -> https://github.com/snap-research/CAT/issues/11)

    Load network at /local_path/teacher/checkpoints/best_A_net_G_A.pth
    Traceback (most recent call last):
      File "distill.py", line 13, in <module>
        trainer = Trainer('distill')
      File "/content/drive/MyDrive/CAT/trainer.py", line 80, in __init__
        model.setup(opt)
      File "/content/drive/MyDrive/CAT/distillers/base_inception_distiller.py", line 260, in setup
        self.load_networks(verbose)
      File "/content/drive/MyDrive/CAT/distillers/inception_distiller.py", line 192, in load_networks
        super(InceptionDistiller, self).load_networks(prune_continue)
      File "/content/drive/MyDrive/CAT/distillers/base_inception_distiller.py", line 362, in load_networks
        verbose)
      File "/content/drive/MyDrive/CAT/utils/util.py", line 139, in load_network
        net.load_state_dict(weights)
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for InceptionGenerator:
    	Missing key(s) in state_dict: "down_sampling.2.running_mean", "down_sampling.2.running_var",...
    

    Lots of other missing keys.

    Please let me know if there is something wrong or missing in my command line, (BTW the teacher training produced 4 Generator networks, which one should I use to distill the A to B generator? )

    opened by jvillegassmule 6
  • Module Error in pix2pix distillation

    Module Error in pix2pix distillation

    I have built a pix2pix model, fully functional using custom code. I downloaded this pre-trained one I made but, I am not able to compress it due to the following error.

    RuntimeError: Error(s) in loading state_dict for InceptionGenerator:
    Missing key(s) in state_dict: "down_sampling.1.weight", "down_sampling.2.weight", "down_sampling.2.bias", .....
    Unexpected key(s) in state_dict: "final_conv.weight", "final_conv.bias", "encoders.0.conv.weight","encoders.1.bn.bias", "encoders.1.bn.running_mean",......
    

    Just wanted to know if the distillation code works only for the architecture trained using the code in models/pix2pix_model.py and trainer.py. The code I used to train the pix2pix GAN is here.

    opened by Ashish-Abraham 4
  • Errors when trying to resume distilling.

    Errors when trying to resume distilling.

    1. I have the image pairs for training ready on the /local_path dir. After successfully train the teacher with the command line:
    !python train.py --dataroot /local_path \
      --model pix2pix \
      --log_dir /local_path/logs/teacher \
      --netG inception_9blocks \
      --lambda_recon 10 \
      --nepochs 500 --nepochs_decay 1000 \
      --norm batch \
      --norm_affine \
      --norm_affine_D \
      --norm_track_running_stats \
      --channels_reduction_factor 6 \
      --preprocess none \
      --kernel_sizes 1 3 5 \
      --save_epoch_freq 50 --save_latest_freq 20000 \
      --direction AtoB \
      --real_stat_path /local_path/out_stat.npz
    
    1. I got a foder full of model checkpoints on /local_path/checkpoints. I was able to resume the train session with something like:
    !python train.py --dataroot /local_path \
      --model pix2pix \
      --log_dir /local_path/logs/teacher \
      --netG inception_9blocks \
      --lambda_recon 10 \
      --nepochs 0 --nepochs_decay 750 \
      --norm batch \
      --norm_affine \
      --norm_affine_D \
      --norm_track_running_stats \
      --channels_reduction_factor 6 \
      --preprocess none \
      --kernel_sizes 1 3 5 \
      --save_epoch_freq 50 --save_latest_freq 20000 \
      --direction AtoB \
      --real_stat_path /local_path/out_stat.npz \
      --epoch_base 750 \
      --iter_base 300001 \
      --restore_G_path /local_path/logs/teacher/checkpoints/latest_net_G.pth \
      --restore_D_path /local_path/logs/teacher/checkpoints/latest_net_D.pth
    

    After training, the results on the eval/(it_number)/fake folder are acceptable.

    1. Then, I was able to run the distiller with the command:
     !python distill.py --dataroot /local_path \
      --distiller inception \
      --log_dir /local_path/logs/student \
      --restore_teacher_G_path /local_path/logs/teacher/checkpoints/best_net_G.pth \
      --restore_pretrained_G_path /local_path/logs/teacher/checkpoints/best_net_G.pth \
      --restore_D_path /local_path/logs/teacher/checkpoints/best_net_D.pth \
      --real_stat_path /local_path/out_stat.npz \
      --nepochs 500 --nepochs_decay 750 \
      --save_latest_freq 25000 --save_epoch_freq 25 \
      --teacher_netG inception_9blocks --student_netG inception_9blocks \
      --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 24 \
      --eval_batch_size 2 \
      --gpu_ids 0 \
      --norm batch \
      --norm_affine \
      --norm_affine_D \
      --norm_track_running_stats \
      --channels_reduction_factor 6 \
      --kernel_sizes 1 3 5 \
      --direction AtoB \
      --lambda_distill 2.0 \
      --prune_cin_lb 16 \
      --target_flops 2.6e9 \
      --distill_G_loss_type ka
    

    I had to stop the session before it finished, again different checkpoint models were saved on the folder /local_path/logs/student/checkpoints. Including pth files for G,D, optim-0,optim-1,A-0,A1,A2 and A3 Progress seems OK on the local_path/logs/student/eval folder

    1. I tried to resume distilling with the command line:
    !python distill.py --dataroot /local_path \
      --distiller inception \
      --log_dir /local_path/logs/student \
      --restore_teacher_G_path /local_path/logs/teacher/checkpoints/best_net_G.pth \
      --restore_pretrained_G_path /local_path/logs/student/checkpoints/latest_net_G.pth \
      --restore_D_path /local_path/logs/student/checkpoints/latest_net_D.pth \
      --restore_student_G_path /local_path/logs/student/checkpoints/latest_net_G.pth\
      --pretrained_student_G_path /local_path/logs/student/checkpoints/latest_net_G.pth\
      --restore_A_path /local_path/logs/student/checkpoints/latest_net_A \
      --restore_O_path /local_path/logs/student/checkpoints/latest_optim \
      --real_stat_path /local_path/out_stat.npz \
      --nepochs 0 --nepochs_decay 325 \
      --save_latest_freq 25000 --save_epoch_freq 25 \
      --teacher_netG inception_9blocks --student_netG inception_9blocks \
      --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 24 \
      --eval_batch_size 2 \
      --gpu_ids 0 \
      --norm batch \
      --norm_affine \
      --norm_affine_D \
      --norm_track_running_stats \
      --channels_reduction_factor 6 \
      --kernel_sizes 1 3 5 \
      --direction AtoB \
      --lambda_distill 2.0 \
      --prune_cin_lb 16 \
      --target_flops 2.6e9 \
      --distill_G_loss_type ka \
      --epoch_base 925 \
      --iter_base 370000
    

    But now I get this error:

    Load network at /local_path/logs/student/checkpoints/latest_net_G.pth
    Traceback (most recent call last):
      File "distill.py", line 13, in <module>
        trainer = Trainer('distill')
      File "/content/CAT/trainer.py", line 80, in __init__
        model.setup(opt)
      File "/content/CAT/distillers/base_inception_distiller.py", line 260, in setup
        self.load_networks(verbose)
      File "/content/CAT/distillers/inception_distiller.py", line 203, in load_networks
        super(InceptionDistiller, self).load_networks()
      File "/content/CAT/distillers/base_inception_distiller.py", line 368, in load_networks
        self.opt.restore_student_G_path, verbose)
      File "/content/CAT/utils/util.py", line 139, in load_network
        net.load_state_dict(weights)
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for InceptionGenerator:
    	Missing key(s) in state_dict: "down_sampling.1.bias", "down_sampling.2.weight", "down_sampling.2.bias", "down_sampling.2.running_mean", "down_sampling.2.running_var", "down_sampling.2.num_batches_tracked", "down_sampling.4.bias", "down_sampling.5.weight", "down_sampling.5.bias", "down_sampling.5.running_mean", "down_sampling.5.running_var", "down_sampling.5.num_batches_tracked", "down_sampling.7.bias"
    

    ... lots of other missing layers, then

    	size mismatch for down_sampling.1.weight: copying a param with shape torch.Size([16, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([24, 3, 7, 7]).
    	size mismatch for down_sampling.4.weight: copying a param with shape torch.Size([16, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([48, 24, 3, 3]).
    	size mismatch for down_sampling.7.weight: copying a param with shape torch.Size([210, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 48, 3, 3]).
    

    ... lots of other size mismatches.

    Seems to me that there is a mismatch between the network that was created internally and the one that is being used to fill it with the previously trained model. Not sure if it is a bug or if something is wrong in the command line I am using to resume.

    Any help will be appreciated.

    opened by jvillegassmule 3
  • distilling freezes with large dimension

    distilling freezes with large dimension

    Hello,

    Since my last CycleGAN model is too blurred, I brought a new dataset with 1024x1024. Now I am testing the pipeline and found that distilling freezes with my specified dimension.

      --preprocess resize_and_crop \
      --load_size 1024 \
      --crop_size 1024,1024 \
    

    It didn't work with none preprocess as well.

     --preprocess none
    

    this is the last log that I can see. distilling doesn't proceed after this point.

    ...
    features.8.res_ops.0.1.1.weight
    features.8.res_ops.1.1.1.weight
    features.8.res_ops.2.1.1.weight
    features.8.dw_ops.0.0.1.weight
    features.8.dw_ops.1.0.1.weight
    features.8.dw_ops.2.0.1.weight
    scale range: [0.9916747808456421, 1.014320731163025]
    

    I've tested with some other options, like

    1. without those options(with default options)
    2. with the size of 500
      --preprocess resize_and_crop \
      --load_size 500 \
      --crop_size 500,500 \
    

    and distilling worked with those cases.

    I wonder if it froze because it is too large or something I misunderstood with the sizing. Also, I saw the options used 256x256 size as default, and other tutorial scripts used the same sizing, I wonder if I still need to go with this sizing with high-resolution face-changing filters. I am training face filters that make people smile.

    Just in case, this is the whole option I used for distill.py

    !python distill.py --dataroot database/face2smile \
      --dataset_mode unaligned \
      --distiller inception \
      --gan_mode lsgan \
      --log_dir logs/cycle_gan/face2smile/inception/student/2p6B \
      --restore_teacher_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_G_A.pth \
      --restore_pretrained_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_G_A.pth \
      --restore_D_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_D_A.pth \
      --real_stat_path real_stat/face2smile_B.npz \
      --teacher_netG inception_9blocks --student_netG inception_9blocks \
      --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
      --ndf 64 \
      --num_threads 80 \
      --eval_batch_size 4 \
      --batch_size 80 \
      --gpu_ids 0,1,2,3 \
      --norm_affine \
      --norm_affine_D \
      --channels_reduction_factor 6 \
      --kernel_sizes 1 3 5 \
      --lambda_distill 1.0 \
      --lambda_recon 5 \
      --prune_cin_lb 16 \
      --target_flops 2.6e9 \
      --distill_G_loss_type ka \
      --save_epoch_freq 1 \
      --save_latest_freq 500 \
      --norm_student batch \
      --padding_type_student zero \
      --norm_affine_student \
      --norm_track_running_stats_student \
      --preprocess resize_and_crop \
      --load_size 1024\
      --crop_size 500,500 \
      --nepochs 1 --nepochs_decay 0
    
    opened by youjinChung 2
  • restore options and recommaned epoch number

    restore options and recommaned epoch number

    Hello,

    Thanks to your help, I am training a cycleGAN with face images for a snap lens. I found some restore options from the tutorial Jupiter notebook,

    --restore_teacher_G_path ukiyo_teacher_iter68000_net_G_B.pth \
    --restore_pretrained_G_path ukiyo_teacher_iter68000_net_G_B.pth \
    --restore_D_path ukiyo_teacher_iter68000_net_D_B.pth \
    

    but when I tried to find them in the options folder, I could only find --restore_O_path option from train_options.py so my questions are

    1. right options for resuming training/ distilling
    --epoch_base XXX \ # last epoch + 1?
    --iter_base XXX \ # last iteration + 1?
    --nepochs XXX \ # total goal epochs?
    --nepochs_decay XXX \ #total goal epochs with lr decay?
    
    #and are these options below only for distilling?
    --restore_teacher_G_path xxx.pth \
    --restore_pretrained_G_path xxx.pth \
    --restore_D_path xxx.pth
    
    1. nepochs / nepochs_decay recommendation When I check the scripts folder, I see --nepochs 500 --nepochs_decay 500. I am training with 4GPUs but still takes quite a bit. I wonder how many epochs are recommended for desirable results.
    opened by youjinChung 2
  • get_real_stat.py  for custom dataset

    get_real_stat.py for custom dataset

    I am working on a pix2pix model and ran this code to get real_stats.npz file with the folder structure specified in get_real_stat.py

    !python get_real_stat.py \
    --dataroot database/sketch/trainA \
    --dataset_mode single \
    --output_path real_stat/sketch_stats_train_A.npz \
    --direction BtoA \
    --phase train
    
    !python get_real_stat.py \
    --dataroot database/sketch/trainB \
    --dataset_mode single \
    --output_path real_stat/sketch_stats_train_B.npz \
    --direction BtoA \
    --phase train
    

    I am facing the error Traceback (most recent call last): File "get_real_stat.py", line 146, in <module> main(opt) File "get_real_stat.py", line 40, in main tensors = torch.cat(tensors, dim=0) RuntimeError: Sizes of tensors must match except in dimension 0. Got 256 and 560 in dimension 2 (The offending index is 909)
    Should I edit the dataloader for this or is it anything else ? Any help is appreciated, please.

    opened by Ashish-Abraham 1
  • custom cyclegan training epoch and data preparation

    custom cyclegan training epoch and data preparation

    Hello,

    I trained a CycleGAN with CelebA dataset. My goal is to make not smiling faces smile. I checked this dataset with another cycleGAN repo, and it worked pretty well with 300 epochs. So I trained with CAT cycleGAN --nepochs 160 --nepochs_decay 160 for training and distilling. I read that other scripts such as horse2zebra has --nepochs 500 --nepochs_decay 500, so I guess my training epochs were too small, but I'd like to ask your opinion first. image image

    So I can only see a little sign that the CycleGAN blurred the male's mouth. and another concern is the filter is too blurred, seems like low resolution. so my questions are

    Do you think that is due to the small number of epochs I trained?

    Let me share the commands I used for training. Or I wonder if my get _real_stat command is wrong

    !python get_real_stat.py --dataroot database/face2smile/valB --output_path real_stat/face2smile_B.npz --direction AtoB --dataset_mode single
    !python get_real_stat.py --dataroot database/face2smile/valA --output_path real_stat/face2smile_A.npz --direction BtoA --dataset_mode single
    
    !python train.py --dataroot database/face2smile \
      --model cycle_gan \
      --log_dir logs/cycle_gan/face2smile/inception/teacher \
      --netG inception_9blocks \
      --real_stat_A_path real_stat/face2smile_A.npz \
      --real_stat_B_path real_stat/face2smile_B.npz \
      --batch_size 16 \
      --restore_G_A_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/140_net_G_A.pth\
      --restore_D_A_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/140_net_D_A.pth\
      --restore_G_B_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/140_net_G_B.pth\
      --restore_D_B_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/140_net_D_B.pth\
      --epoch_base 141 \
      --nepochs 10 --nepochs_decay 140 \
      --num_threads 16 \
      --gpu_ids 0,1,2,3 \
      --norm_affine \
      --norm_affine_D \
      --channels_reduction_factor 6 \
      --kernel_sizes 1 3 5
    
    !python distill.py --dataroot database/face2smile \
      --log_dir logs/cycle_gan/face2smile/inception/student/2p6B \
      --restore_teacher_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_G_A.pth \
      --restore_pretrained_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_G_A.pth \
      --restore_D_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_D_A.pth \
      --real_stat_path real_stat/face2smile_B.npz \
      --dataset_mode unaligned \
      --distiller inception \
      --gan_mode lsgan \
      --nepochs 160 \
      --nepochs_decay 160 \
      --teacher_netG inception_9blocks --student_netG inception_9blocks \
      --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
      --ndf 64 \
      --num_threads 2 \
      --eval_batch_size 2 \
      --batch_size 4 \
      --gpu_ids 0 \
      --norm_affine \
      --norm_affine_D \
      --channels_reduction_factor 6 \
      --kernel_sizes 1 3 5 \
      --lambda_distill 1.0 \
      --lambda_recon 5 \
      --prune_cin_lb 16 \
      --target_flops 2.6e9 \
      --distill_G_loss_type ka \
      --crop_size 512,256 \
      --preprocess resize_and_crop \
      --load_size 600 \
      --save_epoch_freq 1 \
      --save_latest_freq 500 \
      --direction BtoA \
      --norm_student batch \
      --padding_type_student zero \
      --norm_affine_student \
      --norm_track_running_stats_student
    

    so now I am trying to train more like below, do you think I could get working results with these?

     --epoch_base 261 \ #best_net_epoch
     --nepochs 240 --nepochs_decay 500 \
    

    About the blurring.

    My dataset dimension is 178x218 and centered with head. 000671 and I see the train model input dimension is 256x256. Do I need to do some data preparation for that? like upscaling training data. I saw the distilling options for horse2zebra

      --crop_size 512,256 \
      --preprocess resize_and_crop \
      --load_size 600 \
    

    in my case, I probably want to get face crop texture only, so I am thinking of using default options like the below.

      --crop_size 286 \
      --preprocess resize_and_crop \
      --load_size 256, 256 \
    

    Do you think this is the right approach?

    Sorry for so many questions, but I wanted to make my model work and just listed all the possible questions that might be related. Please let me know if you have any opinion on how to make the model work better.

    Best, Youjin

    opened by youjinChung 1
  • I don't see `--teacher_netG ` or `--student_netG` option in distill_options.

    I don't see `--teacher_netG ` or `--student_netG` option in distill_options.

    Hello, I was running img-to-img translation notebook example and when I tried to train, I got this error below.

    --teacher_netG inception_9blocks --student_netG inception_9blocks \
                                       ^
    SyntaxError: invalid syntax
    

    and this is the training block training block

    NUM_EPOCHS = 50 
    
    !python distill.py --dataroot ./database/ukiyoe2photo \
    --log_dir logs/cycle_gan/ukiyoe2photo/inception/student/2p6B \
    --restore_teacher_G_path ukiyo_teacher_iter68000_net_G_B.pth \
    --restore_pretrained_G_path ukiyo_teacher_iter68000_net_G_B.pth \
    --restore_D_path ukiyo_teacher_iter68000_net_D_B.pth \
    --real_stat_path ukiyo_A.npz \
    --dataset_mode unaligned \
    --distiller inception \
    --gan_mode lsgan \
    --nepochs NUM_EPOCHS --nepochs_decay NUM_EPOCHS \ 
    --netG inception_9blocks \
    --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
    --ndf 64 \
    --num_threads 2 \
    --eval_batch_size 2 \
    --batch_size 10 \
    --gpu_ids 0 \
    --norm_affine \
    --norm_affine_D \
    --channels_reduction_factor 6 \
    --kernel_sizes 1 3 5 \
    --lambda_distill 2.8 \
    --lambda_recon 1000 \
    --prune_cin_lb 16 \
    --target_flops 2.6e9 \
    --distill_G_loss_type ka \
    --netD multi_scale \
    --crop_size 512,256 \
    --preprocess resize_and_crop \
    --load_size 600 \
    --save_epoch_freq 1 \
    --save_latest_freq 500 \
    --direction BtoA \
    --norm_student batch \
    --padding_type_student zero \
    --norm_affine_student \
    --norm_track_running_stats_student \
    

    I fixed it to --netG inception_9blocks, but it still shows the same error. I checked script files in the repo and all the options are still --teacher_netG. Could you let me know how to make it work?

    opened by youjinChung 1
  • AssertionError: Invalid device id

    AssertionError: Invalid device id

    Hello, I encountered errors in model training. What should I do? Do you have any suggestions? Thank you very much! “AssertionError: Invalid device id”

    opened by XUEXI-CL 1
  • Distilling doesn't work as expected.

    Distilling doesn't work as expected.

    Hello,

    Since the last question, #24, I tried 512x512 resolution training for both teacher and student models. I found that the teacher model in 512x512 works fine, but student training is not working. I wonder if I can get some hints why

    Tfake img image Sfake img (274/1000 epoch) image

    training options

    !python train.py --dataroot database/face2smile \
      --model cycle_gan \
      --log_dir logs/cycle_gan/face2smile/teacher_512 \
      --netG inception_9blocks \
      --real_stat_A_path real_stat_512/face2smile_A.npz \
      --real_stat_B_path real_stat_512/face2smile_B.npz \
      --batch_size 4 \
      --num_threads 32 \
      --gpu_ids 0,1,2,3 \
      --norm_affine \
      --norm_affine_D \
      --channels_reduction_factor 6 \
      --kernel_sizes 1 3 5 \
      --save_latest_freq 10000 --save_epoch_freq 5 \
      --epoch_base 176 --iter_base 223395 \
      --nepochs 324 --nepochs_decay 500 \
      --preprocess scale_width --load_size 512 \
    
    !python distill.py --dataroot database/face2smile \
      --dataset_mode unaligned \
      --distiller inception \
      --gan_mode lsgan \
      --log_dir logs/cycle_gan/face2smile/student_512 \
      --restore_teacher_G_path logs/cycle_gan/face2smile/teacher_512/checkpoints/170_net_G_A.pth \
      --restore_pretrained_G_path logs/cycle_gan/face2smile/teacher_512/checkpoints/170_net_G_A.pth \
      --restore_D_path logs/cycle_gan/face2smile/teacher_512/checkpoints/170_net_D_A.pth \
      --real_stat_path real_stat_512/face2smile_B.npz \
      --teacher_netG inception_9blocks --student_netG inception_9blocks \
      --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
      --ndf 64 \
      --num_threads 32 \
      --eval_batch_size 4 \
      --batch_size 32 \
      --gpu_ids 0,1,2,3 \
      --norm_affine \
      --norm_affine_D \
      --channels_reduction_factor 6 \
      --kernel_sizes 1 3 5 \
      --lambda_distill 1.0 \
      --lambda_recon 5 \
      --prune_cin_lb 16 \
      --target_flops 2.6e9 \
      --distill_G_loss_type ka \
      --preprocess scale_width --load_size 512 \
      --save_epoch_freq 2 --save_latest_freq 1000 \
      --nepochs 500 --nepochs_decay 500 \
      --norm_student batch \
      --padding_type_student zero \
      --norm_affine_student \
      --norm_track_running_stats_student
    
    opened by youjinChung 12
  • CUDA out of memory

    CUDA out of memory

    Update: I wonder if repo update is possible with torch.amp or mixed precision applied.

    Hello,

    I'd like to ask, with my dataset and machine, if it is normal to see out of memory or if I might have some programming issue.

    RuntimeError: CUDA out of memory. Tried to allocate 260.00 MiB (GPU 0; 15.78 GiB total capacity; 14.02 GiB already allocated; 198.19 MiB free; 14.13 GiB reserved in total by PyTorch)

    I using AWS p3.8xlarge(4 Tesla V100s), trying to train a CycleGAN with 5055 images of 1024x1024 resolution.

    I checked that resized dataset of 512x512 works with batch size 4, but with 1024x1024, even batch size 1 doesn't work.

    I think we need p4d.24xlarge for this project, but it's hard to get the instance due to the lack of zone capacity.

    possible tries are: -reduce num of the dataset (but I think 5055 images are still small for training) My colleague thinks the model loads the whole dataset at once, and that's the reason we need to reduce the dataset. -find a memory leak?

    any comments or hints are appreciated.

    below is the log for reference.

    train.py --dataroot database/face2smile \
    >   --model cycle_gan \
    >   --log_dir logs/cycle_gan/face2smile/teacher_1080 \
    >   --netG inception_9blocks \
    >   --real_stat_A_path real_stat_1080/face2smile_A.npz \
    >   --real_stat_B_path real_stat_1080/face2smile_B.npz \
    >   --batch_size 1 \
    >   --num_threads 1 \
    >   --gpu_ids 0,1,2,3 \
    >   --norm_affine \
    >   --norm_affine_D \
    >   --channels_reduction_factor 6 \
    >   --kernel_sizes 1 3 5 \
    >   --save_latest_freq 10000 --save_epoch_freq 5 \
    >   --nepochs 1 --nepochs_decay 0 \
    >   --preprocess none
    ----------------- Options ---------------
                    active_fn: nn.ReLU                       
                  active_fn_D: nn.LeakyReLU                  
                 aspect_ratio: 1.0                           
                   batch_size: 4                             	[default: 1]
                        beta1: 0.5                           
                     channels: None                          
    channels_reduction_factor: 6                             	[default: 1]
              cityscapes_path: database/cityscapes-origin    
                    crop_size: 256, 256                      
                     dataroot: database/face2smile           	[default: None]
                 dataset_mode: unaligned                     
                    direction: AtoB                          
              display_winsize: 256                           
                     drn_path: drn-d-105_ms_cityscapes.pth   
                 dropout_rate: 0                             
                   epoch_base: 1                             
              eval_batch_size: 1                             
                     gan_mode: lsgan                         
                      gpu_ids: 0,1,2,3                       	[default: 0]
                    init_gain: 0.02                          
                    init_type: normal                        
                     input_nc: 3                             
                      isTrain: True                          	[default: None]
                    iter_base: 1                             
                 kernel_sizes: [1, 3, 5]                     	[default: [3, 5, 7]]
                     lambda_A: 10.0                          
                     lambda_B: 10.0                          
              lambda_identity: 0.5                           
               load_in_memory: False                         
                    load_size: 286                           
                      log_dir: logs/cycle_gan/face2smile/teacher_1080	[default: logs]
                           lr: 0.0002                        
               lr_decay_iters: 50                            
                    lr_policy: linear                        
             max_dataset_size: -1                            
                        model: cycle_gan                     	[default: pix2pix]
         moving_average_decay: 0.0                           
    moving_average_decay_adjust: False                         
    moving_average_decay_base_batch: 32                            
                   n_layers_D: 3                             
                          ndf: 64                            
                      nepochs: 1                             	[default: 100]
                nepochs_decay: 0                             	[default: 100]
                         netD: n_layers                      
                         netG: inception_9blocks             
                          ngf: 64                            
                      no_flip: False                         
                         norm: instance                      
                  norm_affine: True                          	[default: False]
                norm_affine_D: True                          	[default: False]
                 norm_epsilon: 1e-05                         
                norm_momentum: 0.1                           
                 norm_student: instance                      
     norm_track_running_stats: False                         
                  num_threads: 32                            	[default: 4]
                    output_nc: 3                             
                 padding_type: reflect                       
                        phase: train                         
                    pool_size: 50                            
                   preprocess: none                          	[default: resize_and_crop]
                   print_freq: 100                           
             real_stat_A_path: real_stat_1080/face2smile_A.npz	[default: None]
             real_stat_B_path: real_stat_1080/face2smile_B.npz	[default: None]
             restore_D_A_path: None                          
             restore_D_B_path: None                          
             restore_G_A_path: None                          
             restore_G_B_path: None                          
               restore_O_path: None                          
              save_epoch_freq: 5                             	[default: 20]
             save_latest_freq: 10000                         	[default: 20000]
                         seed: 233                           
               serial_batches: False                         
                   table_path: datasets/table.txt            
              tensorboard_dir: None                          
    ----------------- End -------------------
    train.py --dataroot database/face2smile --model cycle_gan --log_dir logs/cycle_gan/face2smile/teacher_1080 --netG inception_9blocks --real_stat_A_path real_stat_1080/face2smile_A.npz --real_stat_B_path real_stat_1080/face2smile_B.npz --batch_size 4 --num_threads 32 --gpu_ids 0,1,2,3 --norm_affine --norm_affine_D --channels_reduction_factor 6 --kernel_sizes 1 3 5 --save_latest_freq 10000 --save_epoch_freq 5 --nepochs 1 --nepochs_decay 0 --preprocess none
    dataset [UnalignedDataset] was created
    The number of training images = 5055
    data shape is: channel=3, height=1024, width=1024.
    initialize network with normal
    initialize network with normal
    initialize network with normal
    initialize network with normal
    dataset [SingleDataset] was created
    dataset [SingleDataset] was created
    /home/ubuntu/.local/lib/python3.9/site-packages/torchvision/models/inception.py:80: FutureWarning: The default weight initialization of inception_v3 will be changed in future releases of torchvision. If you wish to keep the old behavior (which leads to long initialization times due to scipy/scipy#11299), please set init_weights=True.
      warnings.warn('The default weight initialization of inception_v3 will be changed in future releases of '
    model [CycleGANModel] was created
    ---------- Networks initialized -------------
    DataParallel(
      (module): InceptionGenerator(
        (down_sampling): Sequential(
          (0): ReflectionPad2d((3, 3, 3, 3))
          (1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1))
          (2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (3): ReLU(inplace=True)
          (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (6): ReLU(inplace=True)
          (7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (9): ReLU(inplace=True)
        )
        (features): Sequential(
          (0): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (1): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (2): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (3): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (4): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (5): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (6): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (7): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (8): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
        )
        (up_sampling): Sequential(
          (0): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
          (1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (2): ReLU(inplace=True)
          (3): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
          (4): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (5): ReLU(inplace=True)
          (6): ReflectionPad2d((3, 3, 3, 3))
          (7): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1))
          (8): Tanh()
        )
      )
    )
    [Network G_A] Total number of parameters : 8.154 M
    DataParallel(
      (module): InceptionGenerator(
        (down_sampling): Sequential(
          (0): ReflectionPad2d((3, 3, 3, 3))
          (1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1))
          (2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (3): ReLU(inplace=True)
          (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (6): ReLU(inplace=True)
          (7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (9): ReLU(inplace=True)
        )
        (features): Sequential(
          (0): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (1): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (2): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (3): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (4): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (5): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (6): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (7): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
          (8): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
        )
        (up_sampling): Sequential(
          (0): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
          (1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (2): ReLU(inplace=True)
          (3): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
          (4): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (5): ReLU(inplace=True)
          (6): ReflectionPad2d((3, 3, 3, 3))
          (7): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1))
          (8): Tanh()
        )
      )
    )
    [Network G_B] Total number of parameters : 8.154 M
    DataParallel(
      (module): NLayerDiscriminator(
        (model): Sequential(
          (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
          (1): LeakyReLU(negative_slope=0.2, inplace=True)
          (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
          (3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (4): LeakyReLU(negative_slope=0.2, inplace=True)
          (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
          (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (7): LeakyReLU(negative_slope=0.2, inplace=True)
          (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
          (9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (10): LeakyReLU(negative_slope=0.2, inplace=True)
          (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
        )
      )
    )
    [Network D_A] Total number of parameters : 2.767 M
    DataParallel(
      (module): NLayerDiscriminator(
        (model): Sequential(
          (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
          (1): LeakyReLU(negative_slope=0.2, inplace=True)
          (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
          (3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (4): LeakyReLU(negative_slope=0.2, inplace=True)
          (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
          (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (7): LeakyReLU(negative_slope=0.2, inplace=True)
          (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
          (9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
          (10): LeakyReLU(negative_slope=0.2, inplace=True)
          (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
        )
      )
    )
    [Network D_B] Total number of parameters : 2.767 M
    -----------------------------------------------
    start_epoch: 1
    end_epoch: 1
    total_iter: 1
    current memory allocated: 265.4296875
    max memory allocated: 265.4296875
    cached memory: 276.0
    will set input data
    Traceback (most recent call last):
      File "/data/CAT/train.py", line 14, in <module>
        trainer.start()
      File "/data/CAT/trainer.py", line 159, in start
        model.optimize_parameters(total_iter)
      File "/data/CAT/models/cycle_gan_model.py", line 295, in optimize_parameters
        self.forward()
      File "/data/CAT/models/cycle_gan_model.py", line 235, in forward
        self.rec_A = self.netG_B(self.fake_B)
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
        outputs = self.parallel_apply(replicas, inputs, kwargs)
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
        return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
        output.reraise()
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise
        raise self.exc_type(msg)
    RuntimeError: Caught RuntimeError in replica 0 on device 0.
    Original Traceback (most recent call last):
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
        output = module(*input, **kwargs)
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/data/CAT/models/modules/inception_architecture/inception_generator.py", line 141, in forward
        res = self.up_sampling(res)
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 119, in forward
        input = module(input)
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/padding.py", line 173, in forward
        return F.pad(input, self.padding, 'reflect')
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 4014, in _pad
        return torch._C._nn.reflection_pad2d(input, pad)
    RuntimeError: CUDA out of memory. Tried to allocate 260.00 MiB (GPU 0; 15.78 GiB total capacity; 14.02 GiB already allocated; 198.19 MiB free; 14.13 GiB reserved in total by PyTorch)
    
    
    opened by youjinChung 0
  • Key Error in training pix2pix with unaligned data

    Key Error in training pix2pix with unaligned data

    I have been trying to train a pix2pix model on ukiyoe2photo dataset as given in this notebook by SnapML. The create_eval_dataloader function is giving a key error with the following log.

    Traceback (most recent call last): File "train.py", line 14, in trainer.start() File "/content/CAT/trainer.py", line 328, in start (epoch, total_iter)) File "/content/CAT/trainer.py", line 272, in evaluate metrics = self.model.evaluate_model(iter) File "/content/CAT/models/pix2pix_model.py", line 507, in evaluate_model self.set_input(data_i) File "/content/CAT/models/pix2pix_model.py", line 439, in set_input self.real_A = input['A' if AtoB else 'B'].to(self.device) KeyError: 'B'

    Seems like the data produced by the eval_dataloader contains only 2 keys: 'A' and 'A_paths'. Apparently it has no 'B' and 'B_paths' field. It would be great if you may help me with it.

    opened by Ashish-Abraham 0
  • `restore` options to resume `distill.py`

    `restore` options to resume `distill.py`

    Hello,

    I got this error for onnx exporting.

    Traceback (most recent call last):
      File "/home/ubuntu/CAT/onnx_export.py", line 13, in <module>
        exporter = Exporter()
      File "/home/ubuntu/CAT/onnx_exporter.py", line 59, in __init__
        model.netG_student.load_state_dict(
      File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for InceptionGenerator:
    	size mismatch for down_sampling.7.weight: copying a param with shape torch.Size([234, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([230, 16, 3, 3]).
    	size mismatch for down_sampling.7.bias: copying a param with shape torch.Size([234]) from checkpoint, the shape in current model is torch.Size([230]).
    ...
    and so many mismatches
    

    It is so weird since I checked it worked before and actually exported models. Let me share my commands for distilling and exporting.

    !python distill.py --dataroot database/face2smile \
      --dataset_mode unaligned \
      --distiller inception \
      --gan_mode lsgan \
      --log_dir logs/cycle_gan/face2smile/inception/student/2p6B \
      --restore_teacher_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_B_net_G_A.pth \
      --restore_pretrained_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_B_net_G_A.pth \
      --restore_D_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_B_net_D_A.pth \
      --real_stat_path real_stat/face2smile_B.npz \
      --nepochs 500 --nepochs_decay 500 \
      --teacher_netG inception_9blocks --student_netG inception_9blocks \
      --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
      --ndf 64 \
      --num_threads 80 \
      --eval_batch_size 4 \
      --batch_size 80 \
      --gpu_ids 0,1,2,3 \
      --norm_affine \
      --norm_affine_D \
      --channels_reduction_factor 6 \
      --kernel_sizes 1 3 5 \
      --lambda_distill 1.0 \
      --lambda_recon 5 \
      --prune_cin_lb 16 \
      --target_flops 2.6e9 \
      --distill_G_loss_type ka \
      --save_epoch_freq 1 \
      --save_latest_freq 500 \
      --norm_student batch \
      --padding_type_student zero \
      --norm_affine_student \
      --norm_track_running_stats_student
    
    !python3 onnx_export.py --dataroot database/face2smile \
      --log_dir onnx_files/cycle_gan/face2smile/inception/student/2p6B \
      --restore_teacher_G_path logs/cycle_gan/face2smile/inception/teacher/checkpoints/best_A_net_G_A.pth \
      --pretrained_student_G_path logs/cycle_gan/face2smile/inception/student/2p6B/checkpoints/best_net_G.pth \
      --real_stat_path real_stat/face2smile_B.npz \
       --dataset_mode unaligned \
      --pretrained_ngf 64 --teacher_ngf 64 --student_ngf 20 \
      --gpu_ids 0 \
      --norm_affine \
      --channels_reduction_factor 6 \
      --kernel_sizes 1 3 5 \
      --prune_cin_lb 16 \
      --target_flops 2.6e9 \
      --ndf 64 \
      --batch_size 8 \
      --eval_batch_size 2 \
      --num_threads 8 \
      --norm_affine_D \
      --teacher_netG inception_9blocks --student_netG inception_9blocks \
      --distiller inception \
      --gan_mode lsgan \
      --norm_student batch \
      --padding_type_student zero \
      --norm_affine_student \
      --norm_track_running_stats_student
    
    opened by youjinChung 7
Owner
Snap Research
Snap Research
LIAO Shuiying 6 Dec 1, 2022
Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

QSORT QSORT(Quick + Simple Online and Realtime Tracking) is a simple online and realtime tracking algorithm for 2D multiple object tracking in video s

Yonghye Kwon 8 Jul 27, 2022
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
[cvpr22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

PS-MT [cvpr22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation by Yuyuan Liu, Yu Tian, Yuanhong Chen, Fengbei Liu, Vasile

Yuyuan Liu 132 Jan 3, 2023
A tensorflow model that predicts if the image is of a cat or of a dog.

Quick intro Hello and thank you for your interest in my project! This is the backend part of a two-repo application. The other part can be found here

Tudor Matei 0 Mar 8, 2022
SIEM Logstash parsing for more than hundred technologies

LogIndexer Pipeline Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM Why this project exists The overhead o

null 146 Dec 29, 2022
Research shows Google collects 20x more data from Android than Apple collects from iOS. Block this non-consensual telemetry using pihole blocklists.

pihole-antitelemetry Research shows Google collects 20x more data from Android than Apple collects from iOS. Block both using these pihole lists. Proj

Adrian Edwards 290 Jan 9, 2023
We are More than Our JOints: Predicting How 3D Bodies Move

We are More than Our JOints: Predicting How 3D Bodies Move Citation This repo contains the official implementation of our paper MOJO: @inproceedings{Z

null 72 Oct 20, 2022
An AI Assistant More Than a Toolkit

tymon An AI Assistant More Than a Toolkit The reason for creating framework tymon is simple. making AI more like an assistant, helping us to complete

TymonXie 46 Oct 24, 2022
SuperSDR: multiplatform KiwiSDR + CAT transceiver integrator

SuperSDR SuperSDR integrates a realtime spectrum waterfall and audio receive from any KiwiSDR around the world, together with a local (or remote) cont

Marco Cogoni 30 Nov 29, 2022
Official implement of "CAT: Cross Attention in Vision Transformer".

CAT: Cross Attention in Vision Transformer This is official implement of "CAT: Cross Attention in Vision Transformer". Abstract Since Transformer has

null 100 Dec 15, 2022
use tensorflow 2.0 to tell a dog and cat from a specified picture

dog_or_cat use tensorflow 2.0 to tell a dog and cat from a specified picture This is one of the classic experiments for the introduction of deep learn

你这个代码我看不懂 1 Oct 22, 2021
In this project we use both Resnet and Self-attention layer for cat, dog and flower classification.

cdf_att_classification classes = {0: 'cat', 1: 'dog', 2: 'flower'} In this project we use both Resnet and Self-attention layer for cdf-Classification.

null 3 Nov 23, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger ?? Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

Gsunshine 271 Dec 29, 2022
[NeurIPS 2021] Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training

Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training Code for NeurIPS 2021 paper "Better Safe Than Sorry: Preventing Delu

Lue Tao 29 Sep 20, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Implementation of "RaScaNet: Learning Tiny Models by Raster-Scanning Image" from CVPR 2021.

RaScaNet: Learning Tiny Models by Raster-Scanning Images Deploying deep convolutional neural networks on ultra-low power systems is challenging, becau

SAIT (Samsung Advanced Institute of Technology) 5 Dec 26, 2022