Unexpected key(s) in state_dict: "blocks.3.attn.gating_param", "blocks.3.attn.qk.weight", "blocks.3.attn.v.weight", "blocks.3.attn.pos_proj.weight", "blocks.3.attn.pos_proj.bias", "blocks.4.attn.gating_param", "blocks.4.attn.qk.weight", "blocks.4.attn.v.weight", "blocks.4.attn.pos_proj.weight", "blocks.4.attn.pos_proj.bias", "blocks.5.attn.gating_param", "blocks.5.attn.qk.weight", "blocks.5.attn.v.weight", "blocks.5.attn.pos_proj.weight", "blocks.5.attn.pos_proj.bias", "blocks.6.attn.gating_param", "blocks.6.attn.qk.weight", "blocks.6.attn.v.weight", "blocks.6.attn.pos_proj.weight", "blocks.6.attn.pos_proj.bias", "blocks.7.attn.gating_param", "blocks.7.attn.qk.weight", "blocks.7.attn.v.weight", "blocks.7.attn.pos_proj.weight", "blocks.7.attn.pos_proj.bias", "blocks.8.attn.gating_param", "blocks.8.attn.qk.weight", "blocks.8.attn.v.weight", "blocks.8.attn.pos_proj.weight", "blocks.8.attn.pos_proj.bias", "blocks.9.attn.gating_param", "blocks.9.attn.qk.weight", "blocks.9.attn.v.weight", "blocks.9.attn.pos_proj.weight", "blocks.9.attn.pos_proj.bias".
size mismatch for cls_token: copying a param with shape torch.Size([1, 1, 192]) from checkpoint, the shape in current model is torch.Size([1, 1, 768]).
size mismatch for pos_embed: copying a param with shape torch.Size([1, 196, 192]) from checkpoint, the shape in current model is torch.Size([1, 196, 768]).
size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([192, 3, 16, 16]) from checkpoint, the shape in current model is torch.Size([768, 3, 16, 16]).
size mismatch for patch_embed.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.0.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.0.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.0.attn.qk.weight: copying a param with shape torch.Size([384, 192]) from checkpoint, the shape in current model is torch.Size([1536, 768]).
size mismatch for blocks.0.attn.v.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.0.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.0.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.0.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.0.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.1.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.1.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.1.attn.qk.weight: copying a param with shape torch.Size([384, 192]) from checkpoint, the shape in current model is torch.Size([1536, 768]).
size mismatch for blocks.1.attn.v.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.1.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.1.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.1.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.1.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.2.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.2.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.2.attn.qk.weight: copying a param with shape torch.Size([384, 192]) from checkpoint, the shape in current model is torch.Size([1536, 768]).
size mismatch for blocks.2.attn.v.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.2.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.2.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.2.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.2.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.2.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.2.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.2.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.2.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.3.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.3.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.3.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.3.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.3.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.3.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.3.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.3.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.3.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.3.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.4.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.4.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.4.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.4.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.4.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.4.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.4.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.4.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.4.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.4.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.5.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.5.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.5.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.5.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.5.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.5.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.5.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.5.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.5.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.5.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.6.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.6.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.6.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.6.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.6.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.6.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.6.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.6.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.6.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.6.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.7.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.7.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.7.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.7.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.7.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.7.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.7.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.7.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.7.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.7.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.8.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.8.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.8.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.8.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.8.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.8.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.8.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.8.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.8.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.8.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.9.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.9.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.9.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.9.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.9.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.9.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.9.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.9.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.9.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.9.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.10.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.10.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.10.attn.qkv.weight: copying a param with shape torch.Size([576, 192]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
size mismatch for blocks.10.attn.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for blocks.10.attn.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.10.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.10.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.10.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.10.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.10.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.10.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.11.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.11.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).kpoint, the shape in current model is torch.Size([768]). from checkpoint, the shape in current model is torch.Size([2304, 768]).
size mismatch for blocks.11.attn.qkv.weight: copying a param with shape torch.Size([576, 192])) from checkpoint, the shape in current model is torch.Size([768, 768]). from checkpoint, the shape in current model is torch.Size([2304, 768]). checkpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.11.attn.proj.weight: copying a param with shape torch.Size([192, 192]eckpoint, the shape in current model is torch.Size([768]).) from checkpoint, the shape in current model is torch.Size([768, 768]). kpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.11.attn.proj.bias: copying a param with shape torch.Size([192]) from from checkpoint, the shape in current model is torch.Size([3072, 768]).
checkpoint, the shape in current model is torch.Size([768]). eckpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.11.norm2.weight: copying a param with shape torch.Size([192]) from chfrom checkpoint, the shape in current model is torch.Size([768, 3072]).eckpoint, the shape in current model is torch.Size([768]). eckpoint, the shape in current model is torch.Size([768]).
size mismatch for blocks.11.norm2.bias: copying a param with shape torch.Size([192]) from cheche shape in current model is torch.Size([768]).kpoint, the shape in current model is torch.Size([768]). shape in current model is torch.Size([768]).
size mismatch for blocks.11.mlp.fc1.weight: copying a param with shape torch.Size([768, 192]) int, the shape in current model is torch.Size([1000, 768]).
from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for blocks.11.mlp.fc1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for blocks.11.mlp.fc2.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for blocks.11.mlp.fc2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for norm.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for norm.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for head.weight: copying a param with shape torch.Size([1000, 192]) from checkpoint, the shape in current model is torch.Size([1000, 768]).
May I ask what is the matter?
How do I solve it