This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

Overview

guided-diffusion

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

This repository is based on openai/improved-diffusion, with modifications for classifier conditioning and architecture improvements.

Usage

Training diffusion models is described in the parent repository. Training a classifier is similar. We assume you have put training hyperparameters into a TRAIN_FLAGS variable, and classifier hyperparameters into a CLASSIFIER_FLAGS variable. Then you can run:

mpiexec -n N python scripts/classifier_train.py --data_dir path/to/imagenet $TRAIN_FLAGS $CLASSIFIER_FLAGS

Make sure to divide the batch size in TRAIN_FLAGS by the number of MPI processes you are using.

Here are flags for training the 128x128 classifier. You can modify these for training classifiers at other resolutions:

TRAIN_FLAGS="--iterations 300000 --anneal_lr True --batch_size 256 --lr 3e-4 --save_interval 10000 --weight_decay 0.05"
CLASSIFIER_FLAGS="--image_size 128 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"

For sampling from a 128x128 classifier-guided model, 25 step DDIM:

MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 128 --learn_sigma True --num_channels 256 --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
CLASSIFIER_FLAGS="--image_size 128 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True"
SAMPLE_FLAGS="--batch_size 4 --num_samples 50000 --timestep_respacing ddim25 --use_ddim True"
mpiexec -n N python scripts/classifier_sample.py \
    --model_path /path/to/model.pt \
    --classifier_path path/to/classifier.pt \
    $MODEL_FLAGS $CLASSIFIER_FLAGS $SAMPLE_FLAGS

To sample for 250 timesteps without DDIM, replace --timestep_respacing ddim25 to --timestep_respacing 250, and replace --use_ddim True with --use_ddim False.

Comments
  • resume training does not work for multi-gpus training

    resume training does not work for multi-gpus training

    I add --resume_checkpoint $path_to_checkpoint$ to continue the training, it works for a single GPU, but does not work for multi-gpus

    the code gets stuck here:

    Logging to /proj/ihorse_2021/users/x_manni/guided-diffusion/log9 creating model and diffusion... creating data loader... start training... loading model from checkpoint: /proj/ihorse_2021/users/x_manni/guided-diffusion/log9/model200000.pt...

    opened by forever208 6
  • training across mutiple nodes does not work

    training across mutiple nodes does not work

    if the number of GPUs > 8 (each node has 8 GPUs), then I have to train in several nodes

    In this case, run by mpiexec -n 16 python script/image_train.py doesn't work.

    It says the error of nccl

    opened by forever208 4
  • Figures Appendix C of the paper

    Figures Appendix C of the paper

    Hello,

    Congrats on the great work and thanks for sharing the code!

    I wonder how you generated Figure 8 of Appendix C in your paper?

    How are the images so similar, even when the scale is so reduced? Is there any conditioning beyond classifier guidance?

    Many thanks!

    opened by SANCHES-Pedro 4
  • What is the 'zero_module' used for? [Question]

    What is the 'zero_module' used for? [Question]

    First of all, thanks for an extraordinary paper - so many interesting details!! Also, thanks for open sourcing the code.

    I have a few ideas I want to test and I'm trying to understand all the parts of the code. Most of it is clear and well commented, but I can't seem to figure out the reasoning behind the 'zero_module' you have in a few places in the guided-diffusion/guided_diffusion/unet.py file?

    def zero_module(module):
        """
        Zero out the parameters of a module and return it.
        """
        for p in module.parameters():
            p.detach().zero_()
        return module
    

    I couldn't find anything in the paper or online to explain why this is used.

    I'm also curious why you used a custom mixed precision training instead of using PyTorch's mixed precision training (torch.cuda.amp.autocast)?

    opened by santodante 3
  • Error when loading pretrained weights

    Error when loading pretrained weights

    Thank you for releasing pretrained weights. I tried to use some of your pretrained weights as you described in the readme but there is a mismatch between checkpoint weights and the model.

    Logging to /tmp/openai-2021-07-22-14-28-52-986510 creating model and diffusion... Traceback (most recent call last): File "scripts/image_sample.py", line 108, in main() File "scripts/image_sample.py", line 33, in main model.load_state_dict( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UNetModel: Missing key(s) in state_dict: "input_blocks.3.0.op.weight", "input_blocks.3.0.op.bias", "input_blocks.4.0.skip_connection.weight", "input_blocks.4.0.skip_connection.bias", "input_blocks.6.0.op.weight", "input_blocks.6.0.op.bias", "input_blocks.7.1.norm.weight", "input_blocks.7.1.norm.bias", "input_blocks.7.1.qkv.weight", "input_blocks.7.1.qkv.bias", "input_blocks.7.1.proj_out.weight", "input_blocks.7.1.proj_out.bias", "input_blocks.8.1.norm.weight", "input_blocks.8.1.norm.bias", "input_blocks.8.1.qkv.weight", "input_blocks.8.1.qkv.bias", "input_blocks.8.1.proj_out.weight", "input_blocks.8.1.proj_out.bias", "input_blocks.9.0.op.weight", "input_blocks.9.0.op.bias", "input_blocks.10.0.skip_connection.weight", "input_blocks.10.0.skip_connection.bias", "output_blocks.2.2.conv.weight", "output_blocks.2.2.conv.bias", "output_blocks.5.2.conv.weight", "output_blocks.5.2.conv.bias", "output_blocks.8.1.conv.weight", "output_blocks.8.1.conv.bias". Unexpected key(s) in state_dict: "input_blocks.12.0.in_layers.0.weight", "input_blocks.12.0.in_layers.0.bias", "input_blocks.12.0.in_layers.2.weight", "input_blocks.12.0.in_layers.2.bias", "input_blocks.12.0.emb_layers.1.weight", "input_blocks.12.0.emb_layers.1.bias", "input_blocks.12.0.out_layers.0.weight", "input_blocks.12.0.out_layers.0.bias", "input_blocks.12.0.out_layers.3.weight", "input_blocks.12.0.out_layers.3.bias", "input_blocks.13.0.in_layers.0.weight", "input_blocks.13.0.in_layers.0.bias", "input_blocks.13.0.in_layers.2.weight", "input_blocks.13.0.in_layers.2.bias", "input_blocks.13.0.emb_layers.1.weight", "input_blocks.13.0.emb_layers.1.bias", "input_blocks.13.0.out_layers.0.weight", "input_blocks.13.0.out_layers.0.bias", "input_blocks.13.0.out_layers.3.weight", "input_blocks.13.0.out_layers.3.bias", "input_blocks.13.0.skip_connection.weight", "input_blocks.13.0.skip_connection.bias", "input_blocks.13.1.norm.weight", "input_blocks.13.1.norm.bias", "input_blocks.13.1.qkv.weight", "input_blocks.13.1.qkv.bias", "input_blocks.13.1.proj_out.weight", "input_blocks.13.1.proj_out.bias", "input_blocks.14.0.in_layers.0.weight", "input_blocks.14.0.in_layers.0.bias", "input_blocks.14.0.in_layers.2.weight", "input_blocks.14.0.in_layers.2.bias", "input_blocks.14.0.emb_layers.1.weight", "input_blocks.14.0.emb_layers.1.bias", "input_blocks.14.0.out_layers.0.weight", "input_blocks.14.0.out_layers.0.bias", "input_blocks.14.0.out_layers.3.weight", "input_blocks.14.0.out_layers.3.bias", "input_blocks.14.1.norm.weight", "input_blocks.14.1.norm.bias", "input_blocks.14.1.qkv.weight", "input_blocks.14.1.qkv.bias", "input_blocks.14.1.proj_out.weight", "input_blocks.14.1.proj_out.bias", "input_blocks.15.0.in_layers.0.weight", "input_blocks.15.0.in_layers.0.bias", "input_blocks.15.0.in_layers.2.weight", "input_blocks.15.0.in_layers.2.bias", "input_blocks.15.0.emb_layers.1.weight", "input_blocks.15.0.emb_layers.1.bias", "input_blocks.15.0.out_layers.0.weight", "input_blocks.15.0.out_layers.0.bias", "input_blocks.15.0.out_layers.3.weight", "input_blocks.15.0.out_layers.3.bias", "input_blocks.16.0.in_layers.0.weight", "input_blocks.16.0.in_layers.0.bias", "input_blocks.16.0.in_layers.2.weight", "input_blocks.16.0.in_layers.2.bias", "input_blocks.16.0.emb_layers.1.weight", "input_blocks.16.0.emb_layers.1.bias", "input_blocks.16.0.out_layers.0.weight", "input_blocks.16.0.out_layers.0.bias", "input_blocks.16.0.out_layers.3.weight", "input_blocks.16.0.out_layers.3.bias", "input_blocks.16.1.norm.weight", "input_blocks.16.1.norm.bias", "input_blocks.16.1.qkv.weight", "input_blocks.16.1.qkv.bias", "input_blocks.16.1.proj_out.weight", "input_blocks.16.1.proj_out.bias", "input_blocks.17.0.in_layers.0.weight", "input_blocks.17.0.in_layers.0.bias", "input_blocks.17.0.in_layers.2.weight", "input_blocks.17.0.in_layers.2.bias", "input_blocks.17.0.emb_layers.1.weight", "input_blocks.17.0.emb_layers.1.bias", "input_blocks.17.0.out_layers.0.weight", "input_blocks.17.0.out_layers.0.bias", "input_blocks.17.0.out_layers.3.weight", "input_blocks.17.0.out_layers.3.bias", "input_blocks.17.1.norm.weight", "input_blocks.17.1.norm.bias", "input_blocks.17.1.qkv.weight", "input_blocks.17.1.qkv.bias", "input_blocks.17.1.proj_out.weight", "input_blocks.17.1.proj_out.bias", "input_blocks.3.0.in_layers.0.weight", "input_blocks.3.0.in_layers.0.bias", "input_blocks.3.0.in_layers.2.weight", "input_blocks.3.0.in_layers.2.bias", "input_blocks.3.0.emb_layers.1.weight", "input_blocks.3.0.emb_layers.1.bias", "input_blocks.3.0.out_layers.0.weight", "input_blocks.3.0.out_layers.0.bias", "input_blocks.3.0.out_layers.3.weight", "input_blocks.3.0.out_layers.3.bias", "input_blocks.6.0.in_layers.0.weight", "input_blocks.6.0.in_layers.0.bias", "input_blocks.6.0.in_layers.2.weight", "input_blocks.6.0.in_layers.2.bias", "input_blocks.6.0.emb_layers.1.weight", "input_blocks.6.0.emb_layers.1.bias", "input_blocks.6.0.out_layers.0.weight", "input_blocks.6.0.out_layers.0.bias", "input_blocks.6.0.out_layers.3.weight", "input_blocks.6.0.out_layers.3.bias", "input_blocks.9.0.in_layers.0.weight", "input_blocks.9.0.in_layers.0.bias", "input_blocks.9.0.in_layers.2.weight", "input_blocks.9.0.in_layers.2.bias", "input_blocks.9.0.emb_layers.1.weight", "input_blocks.9.0.emb_layers.1.bias", "input_blocks.9.0.out_layers.0.weight", "input_blocks.9.0.out_layers.0.bias", "input_blocks.9.0.out_layers.3.weight", "input_blocks.9.0.out_layers.3.bias", "output_blocks.12.0.in_layers.0.weight", "output_blocks.12.0.in_layers.0.bias", "output_blocks.12.0.in_layers.2.weight", "output_blocks.12.0.in_layers.2.bias", "output_blocks.12.0.emb_layers.1.weight", "output_blocks.12.0.emb_layers.1.bias", "output_blocks.12.0.out_layers.0.weight", "output_blocks.12.0.out_layers.0.bias", "output_blocks.12.0.out_layers.3.weight", "output_blocks.12.0.out_layers.3.bias", "output_blocks.12.0.skip_connection.weight", "output_blocks.12.0.skip_connection.bias", "output_blocks.13.0.in_layers.0.weight", "output_blocks.13.0.in_layers.0.bias", "output_blocks.13.0.in_layers.2.weight", "output_blocks.13.0.in_layers.2.bias", "output_blocks.13.0.emb_layers.1.weight", "output_blocks.13.0.emb_layers.1.bias", "output_blocks.13.0.out_layers.0.weight", "output_blocks.13.0.out_layers.0.bias", "output_blocks.13.0.out_layers.3.weight", "output_blocks.13.0.out_layers.3.bias", "output_blocks.13.0.skip_connection.weight", "output_blocks.13.0.skip_connection.bias", "output_blocks.14.0.in_layers.0.weight", "output_blocks.14.0.in_layers.0.bias", "output_blocks.14.0.in_layers.2.weight", "output_blocks.14.0.in_layers.2.bias", "output_blocks.14.0.emb_layers.1.weight", "output_blocks.14.0.emb_layers.1.bias", "output_blocks.14.0.out_layers.0.weight", "output_blocks.14.0.out_layers.0.bias", "output_blocks.14.0.out_layers.3.weight", "output_blocks.14.0.out_layers.3.bias", "output_blocks.14.0.skip_connection.weight", "output_blocks.14.0.skip_connection.bias", "output_blocks.14.1.in_layers.0.weight", "output_blocks.14.1.in_layers.0.bias", "output_blocks.14.1.in_layers.2.weight", "output_blocks.14.1.in_layers.2.bias", "output_blocks.14.1.emb_layers.1.weight", "output_blocks.14.1.emb_layers.1.bias", "output_blocks.14.1.out_layers.0.weight", "output_blocks.14.1.out_layers.0.bias", "output_blocks.14.1.out_layers.3.weight", "output_blocks.14.1.out_layers.3.bias", "output_blocks.15.0.in_layers.0.weight", "output_blocks.15.0.in_layers.0.bias", "output_blocks.15.0.in_layers.2.weight", "output_blocks.15.0.in_layers.2.bias", "output_blocks.15.0.emb_layers.1.weight", "output_blocks.15.0.emb_layers.1.bias", "output_blocks.15.0.out_layers.0.weight", "output_blocks.15.0.out_layers.0.bias", "output_blocks.15.0.out_layers.3.weight", "output_blocks.15.0.out_layers.3.bias", "output_blocks.15.0.skip_connection.weight", "output_blocks.15.0.skip_connection.bias", "output_blocks.16.0.in_layers.0.weight", "output_blocks.16.0.in_layers.0.bias", "output_blocks.16.0.in_layers.2.weight", "output_blocks.16.0.in_layers.2.bias", "output_blocks.16.0.emb_layers.1.weight", "output_blocks.16.0.emb_layers.1.bias", "output_blocks.16.0.out_layers.0.weight", "output_blocks.16.0.out_layers.0.bias", "output_blocks.16.0.out_layers.3.weight", "output_blocks.16.0.out_layers.3.bias", "output_blocks.16.0.skip_connection.weight", "output_blocks.16.0.skip_connection.bias", "output_blocks.17.0.in_layers.0.weight", "output_blocks.17.0.in_layers.0.bias", "output_blocks.17.0.in_layers.2.weight", "output_blocks.17.0.in_layers.2.bias", "output_blocks.17.0.emb_layers.1.weight", "output_blocks.17.0.emb_layers.1.bias", "output_blocks.17.0.out_layers.0.weight", "output_blocks.17.0.out_layers.0.bias", "output_blocks.17.0.out_layers.3.weight", "output_blocks.17.0.out_layers.3.bias", "output_blocks.17.0.skip_connection.weight", "output_blocks.17.0.skip_connection.bias", "output_blocks.2.2.in_layers.0.weight", "output_blocks.2.2.in_layers.0.bias", "output_blocks.2.2.in_layers.2.weight", "output_blocks.2.2.in_layers.2.bias", "output_blocks.2.2.emb_layers.1.weight", "output_blocks.2.2.emb_layers.1.bias", "output_blocks.2.2.out_layers.0.weight", "output_blocks.2.2.out_layers.0.bias", "output_blocks.2.2.out_layers.3.weight", "output_blocks.2.2.out_layers.3.bias", "output_blocks.5.2.in_layers.0.weight", "output_blocks.5.2.in_layers.0.bias", "output_blocks.5.2.in_layers.2.weight", "output_blocks.5.2.in_layers.2.bias", "output_blocks.5.2.emb_layers.1.weight", "output_blocks.5.2.emb_layers.1.bias", "output_blocks.5.2.out_layers.0.weight", "output_blocks.5.2.out_layers.0.bias", "output_blocks.5.2.out_layers.3.weight", "output_blocks.5.2.out_layers.3.bias", "output_blocks.6.1.norm.weight", "output_blocks.6.1.norm.bias", "output_blocks.6.1.qkv.weight", "output_blocks.6.1.qkv.bias", "output_blocks.6.1.proj_out.weight", "output_blocks.6.1.proj_out.bias", "output_blocks.7.1.norm.weight", "output_blocks.7.1.norm.bias", "output_blocks.7.1.qkv.weight", "output_blocks.7.1.qkv.bias", "output_blocks.7.1.proj_out.weight", "output_blocks.7.1.proj_out.bias", "output_blocks.8.2.in_layers.0.weight", "output_blocks.8.2.in_layers.0.bias", "output_blocks.8.2.in_layers.2.weight", "output_blocks.8.2.in_layers.2.bias", "output_blocks.8.2.emb_layers.1.weight", "output_blocks.8.2.emb_layers.1.bias", "output_blocks.8.2.out_layers.0.weight", "output_blocks.8.2.out_layers.0.bias", "output_blocks.8.2.out_layers.3.weight", "output_blocks.8.2.out_layers.3.bias", "output_blocks.8.1.norm.weight", "output_blocks.8.1.norm.bias", "output_blocks.8.1.qkv.weight", "output_blocks.8.1.qkv.bias", "output_blocks.8.1.proj_out.weight", "output_blocks.8.1.proj_out.bias", "output_blocks.11.1.in_layers.0.weight", "output_blocks.11.1.in_layers.0.bias", "output_blocks.11.1.in_layers.2.weight", "output_blocks.11.1.in_layers.2.bias", "output_blocks.11.1.emb_layers.1.weight", "output_blocks.11.1.emb_layers.1.bias", "output_blocks.11.1.out_layers.0.weight", "output_blocks.11.1.out_layers.0.bias", "output_blocks.11.1.out_layers.3.weight", "output_blocks.11.1.out_layers.3.bias". size mismatch for time_embed.0.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([512, 128]). size mismatch for time_embed.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for time_embed.2.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for time_embed.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 3, 3, 3]). size mismatch for input_blocks.0.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for input_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for input_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for input_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for input_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for input_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for input_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]). size mismatch for input_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for input_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 256, 3, 3]). size mismatch for input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 256, 1, 1]). size mismatch for input_blocks.7.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for input_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for input_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for input_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for input_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 384, 3, 3]). size mismatch for input_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for input_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for middle_block.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for middle_block.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for middle_block.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for middle_block.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 512, 1]). size mismatch for middle_block.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for middle_block.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]). size mismatch for middle_block.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.in_layers.2.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for middle_block.2.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for middle_block.2.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for middle_block.2.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]). size mismatch for output_blocks.0.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for output_blocks.0.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for output_blocks.0.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]). size mismatch for output_blocks.0.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 512, 1]). size mismatch for output_blocks.0.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for output_blocks.0.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]). size mismatch for output_blocks.0.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]). size mismatch for output_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for output_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for output_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]). size mismatch for output_blocks.1.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 512, 1]). size mismatch for output_blocks.1.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for output_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]). size mismatch for output_blocks.1.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([896]). size mismatch for output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([896]). size mismatch for output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 896, 3, 3]). size mismatch for output_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for output_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for output_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 896, 1, 1]). size mismatch for output_blocks.2.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 512, 1]). size mismatch for output_blocks.2.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for output_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]). size mismatch for output_blocks.2.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([896]). size mismatch for output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([896]). size mismatch for output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 896, 3, 3]). size mismatch for output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for output_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for output_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 896, 1, 1]). size mismatch for output_blocks.3.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1152, 384, 1]). size mismatch for output_blocks.3.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1152]). size mismatch for output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([384, 384, 1]). size mismatch for output_blocks.3.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 768, 3, 3]). size mismatch for output_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for output_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for output_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 768, 1, 1]). size mismatch for output_blocks.4.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1152, 384, 1]). size mismatch for output_blocks.4.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1152]). size mismatch for output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([384, 384, 1]). size mismatch for output_blocks.4.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for output_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for output_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 1536, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 640, 3, 3]). size mismatch for output_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for output_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for output_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.0.skip_connection.weight: copying a param with shape torch.Size([1024, 1536, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 640, 1, 1]). size mismatch for output_blocks.5.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1152, 384, 1]). size mismatch for output_blocks.5.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1152]). size mismatch for output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([384, 384, 1]). size mismatch for output_blocks.5.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for output_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for output_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1536, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 640, 3, 3]). size mismatch for output_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for output_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for output_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.6.0.skip_connection.weight: copying a param with shape torch.Size([512, 1536, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 640, 1, 1]). size mismatch for output_blocks.6.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]). size mismatch for output_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for output_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for output_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]). size mismatch for output_blocks.7.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 384, 3, 3]). size mismatch for output_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for output_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for output_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.8.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 384, 1, 1]). size mismatch for output_blocks.8.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 384, 3, 3]). size mismatch for output_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for output_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for output_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.9.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 384, 1, 1]). size mismatch for output_blocks.9.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 3, 3]). size mismatch for output_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for output_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for output_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.10.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]). size mismatch for output_blocks.10.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([512, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 3, 3]). size mismatch for output_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for output_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for output_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.11.0.skip_connection.weight: copying a param with shape torch.Size([512, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]). size mismatch for output_blocks.11.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for out.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for out.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for out.2.weight: copying a param with shape torch.Size([6, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 128, 3, 3]). size mismatch for out.2.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([3]).

    opened by mehdizemni 3
  • out of memory in FID evaluation

    out of memory in FID evaluation

    failed to allocate 64.00M (67108864 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

    Does anyone have a problem using GPU when doing evaluation?

    opened by forever208 2
  • [Bugs] Classifier is conditioning on x_{t+1} instead of x_{t} and grad calculation

    [Bugs] Classifier is conditioning on x_{t+1} instead of x_{t} and grad calculation

    Hi @unixpickle @prafullasd @erinbeesley, I think I found 2 bugs:

    1. Shouldn't we pass out["mean"] (x_{t}) instead of x (x_{t+1}) here (similarly t-1 instead of t): https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/gaussian_diffusion.py#L435

    2. Shouldn't we separate grad calculation here? https://github.com/openai/guided-diffusion/blob/main/scripts/classifier_sample.py#L61 We need grads of i-th image in the batch w.r.t. corresponding log prob and not grads of i-th image w.r.t. the sum of log probs? It makes no sense to optimize the sum, as we want each image to be directed by its own class.

    I might have misunderstood something please let me know if so! :)

    opened by gordicaleksa 2
  • Training doesn't really work

    Training doesn't really work

    When running

    python scripts/image_train.py --data_dir path/to/images $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS The code will run for 2 seconds and finish it without creating some /tmp folder. There isn't any error neither.

    opened by baniasbaabe 2
  • Add dimension as input to create_model

    Add dimension as input to create_model

    Hi, thank you so much for open sourcing the code. I was planning on using this repo for 3d images and although I can see that the UNet class allows 3d inputs, the "dims" keyword is not passed through the create_model function in script_util.py. If it is alright with the owners of the repo, I would like to push my change that would allow this. This change would keep the default dimension to 2 so that no breaking changes hopefully result.

    opened by jamaliki 2
  • guided diffusion super resolution network training is diverging

    guided diffusion super resolution network training is diverging

    Hello everyone,

    I am working with guided diffusion. I would like to reproduce the results of the repository for the 64->256 super resolution network.

    My issue is that the upsampled images look good for the first 5000 iterations, but then the loss rapidly increases and the output is only noise from 6000 iterations on.

    The difference between my trained model and the authors provided model, is that I don't do any conditioning, so I removed the classifier. The training batch in this way only provides the "low_res" image, and not the output from the classifier "y".

    What do you think is wrong here? do you have some hints for debugging?

    Please have a look at the produced samples and loss at: https://i.imgur.com/upuzHPg.png (I couldn't embed the image on)

    EDIT: my batch size is 3, whereas it seems the original repo has batch size 256. This could be a culprit. However, I have no idea which kind of GPU can load a batch size of 256. As a second question. Which GPU can handle such a massive batch size of 256? is it a multi GPU configuration?

    My goal is to produce a solid baseline to work on.

    Thanks in advance to anyone who helps,

    Stefano

    opened by stsavian 1
  • Question about training on multi GPUs

    Question about training on multi GPUs

    Hi, I use multi-gpus to train my model like mpiexec -n 8 python scripts/image_train.py. However, I encountered an issue that I couldn't load the ckpt to all gpus and it encountered OOM when I finetuned 512*512 model. But I didn't encountered this problem when I tried to train on one GPU. Could I ask what I should do?

    opened by YUHANG-Ma 1
  • About the gpu memory usage of the training process

    About the gpu memory usage of the training process

    Hello, I have reproduced the experiment of the DDPM basic model before. When I use a picture of 128*128 size and batchsize is set to 2, the gpu memory usage is about 18G; when I use your code, the same setting, gpu The usage is about 5G, is this normal, or maybe there is something wrong with my settings?

    opened by zy-charon 0
  • Parameters for Diffusion Model and Classifier Training

    Parameters for Diffusion Model and Classifier Training

    Hello, I would like to ask about the parameters in the diffusion model and classifier: predict_xstart, rescale_timesteps, rescale_learned_sigmas, classifier_use_scale_shift_norm, classifier_resblock_updown, classifier_pool, use_checkpoint, use_scale_shift_norm, resblock_updown, use_fp16 and use_new_attention_order, etc. What do they mean? How should it be set during training?

    opened by zy-charon 0
  • After training the sample pictures get some weird color tints

    After training the sample pictures get some weird color tints

    Sometimes in training i get some weird color schemes in my pictures, while the original data has no tints at all. Is there a reason for it, and how could I avoid it?

    Original data is like: 40SMA0013_1Y5_01

    And the output is: 8c0b5bd2-c316-4c06-968c-5d4d2b0607b3

    opened by benjamin-bertram 2
  • How is the diffusion model used in the detection task

    How is the diffusion model used in the detection task

    Hello, I would like to ask which weight file I should download if I use the diffusion model in the detection task, or maybe I need several cards to train the diffusion model by myself

    opened by QY1994-0919 0
Owner
OpenAI
OpenAI
Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

Minimal code and simple experiments to play with Denoising Diffusion Probabilist

Rithesh Kumar 16 Oct 6, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 4, 2023
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 7, 2023
High-Resolution Image Synthesis with Latent Diffusion Models

Latent Diffusion Models Requirements A suitable conda environment named ldm can be created and activated with: conda env create -f environment.yaml co

CompVis Heidelberg 5.6k Jan 4, 2023
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

null 31 Sep 27, 2022
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 1, 2023
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Bilateral Denoising Diffusion Models (BDDMs) This is the official PyTorch implementation of the following paper: BDDM: BILATERAL DENOISING DIFFUSION M

null 172 Dec 23, 2022
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Anycost GAN video | paper | website Anycost GANs for Interactive Image Synthesis and Editing Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zh

MIT HAN Lab 726 Dec 28, 2022
Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

GANs for Biological Image Synthesis This codes implements the ICCV-2017 paper "GANs for Biological Image Synthesis". The paper and its supplementary m

Anton Osokin 95 Nov 25, 2022
Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

Keon Lee 152 Jan 2, 2023
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Dinghan Shen 49 Dec 22, 2022
Technical experimentations to beat the stock market using deep learning :chart_with_upwards_trend:

DeepStock Technical experimentations to beat the stock market using deep learning. Experimentations Deep Learning Stock Prediction with Daily News Hea

Keon 449 Dec 29, 2022
End-to-end beat and downbeat tracking in the time domain.

WaveBeat End-to-end beat and downbeat tracking in the time domain. | Paper | Code | Video | Slides | Setup First clone the repo. git clone https://git

Christian J. Steinmetz 60 Dec 24, 2022
Implementation of the state of the art beat-detection, downbeat-detection and tempo-estimation model

The ISMIR 2020 Beat Detection, Downbeat Detection and Tempo Estimation Model Implementation. This is an implementation in TensorFlow to implement the

Koen van den Brink 1 Nov 12, 2021
This is the PyTorch implementation of GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

null 1.1k Jan 1, 2023
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 3, 2023
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

null 5 Jan 4, 2023
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

Meta Research 5.3k Jan 3, 2023
Learning Energy-Based Models by Diffusion Recovery Likelihood

Learning Energy-Based Models by Diffusion Recovery Likelihood Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P. Kingma Paper: https://arxiv.o

Ruiqi Gao 41 Nov 22, 2022