Hi, thank you for sharing your work.
When I follow the instructions in the notebook file (VideoTransformer_demo.ipynb), I got trouble loading the pre-trained weights of ViViT model.
After downloading and placing the "./vivit_model.pth" file, I was able to instantiate the ViViT model.
However, the log says that there are many missing keys in the given pth file.
Is it the desired behavior? or should I do some preprocessing to match the parameter name?
This is the output after parameter loading.
load model finished, the missing key of transformer is:['transformer_layers.0.layers.0.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.0.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.0.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.0.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.1.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.1.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.1.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.1.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.2.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.2.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.2.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.2.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.3.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.3.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.3.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.3.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.4.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.4.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.4.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.4.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.5.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.5.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.5.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.5.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.6.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.6.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.6.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.6.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.7.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.7.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.7.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.7.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.8.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.8.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.8.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.8.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.9.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.9.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.9.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.9.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.10.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.10.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.10.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.10.attentions.0.attn.proj.bias', 'transformer_layers.0.layers.11.attentions.0.attn.qkv.weight', 'transformer_layers.0.layers.11.attentions.0.attn.qkv.bias', 'transformer_layers.0.layers.11.attentions.0.attn.proj.weight', 'transformer_layers.0.layers.11.attentions.0.attn.proj.bias', 'transformer_layers.1.layers.0.attentions.0.attn.qkv.weight', 'transformer_layers.1.layers.0.attentions.0.attn.qkv.bias', 'transformer_layers.1.layers.0.attentions.0.attn.proj.weight', 'transformer_layers.1.layers.0.attentions.0.attn.proj.bias', 'transformer_layers.1.layers.1.attentions.0.attn.qkv.weight', 'transformer_layers.1.layers.1.attentions.0.attn.qkv.bias', 'transformer_layers.1.layers.1.attentions.0.attn.proj.weight', 'transformer_layers.1.layers.1.attentions.0.attn.proj.bias', 'transformer_layers.1.layers.2.attentions.0.attn.qkv.weight', 'transformer_layers.1.layers.2.attentions.0.attn.qkv.bias', 'transformer_layers.1.layers.2.attentions.0.attn.proj.weight', 'transformer_layers.1.layers.2.attentions.0.attn.proj.bias', 'transformer_layers.1.layers.3.attentions.0.attn.qkv.weight', 'transformer_layers.1.layers.3.attentions.0.attn.qkv.bias', 'transformer_layers.1.layers.3.attentions.0.attn.proj.weight', 'transformer_layers.1.layers.3.attentions.0.attn.proj.bias'], cls is:[]
Thank you in advance!
+edit)
FYI, these are the unexpected keys from the load_state_dict().
transformer unexpected: ['cls_head.weight', 'cls_head.bias', 'transformer_layers.0.layers.0.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.0.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.0.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.0.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.1.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.1.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.1.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.1.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.2.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.2.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.2.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.2.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.3.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.3.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.3.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.3.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.4.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.4.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.4.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.4.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.5.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.5.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.5.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.5.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.6.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.6.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.6.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.6.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.7.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.7.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.7.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.7.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.8.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.8.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.8.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.8.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.9.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.9.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.9.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.9.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.10.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.10.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.10.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.10.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.11.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.11.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.11.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.11.attentions.0.attn.out_proj.bias', 'transformer_layers.1.layers.0.attentions.0.attn.in_proj_weight', 'transformer_layers.1.layers.0.attentions.0.attn.in_proj_bias', 'transformer_layers.1.layers.0.attentions.0.attn.out_proj.weight', 'transformer_layers.1.layers.0.attentions.0.attn.out_proj.bias', 'transformer_layers.1.layers.1.attentions.0.attn.in_proj_weight', 'transformer_layers.1.layers.1.attentions.0.attn.in_proj_bias', 'transformer_layers.1.layers.1.attentions.0.attn.out_proj.weight', 'transformer_layers.1.layers.1.attentions.0.attn.out_proj.bias', 'transformer_layers.1.layers.2.attentions.0.attn.in_proj_weight', 'transformer_layers.1.layers.2.attentions.0.attn.in_proj_bias', 'transformer_layers.1.layers.2.attentions.0.attn.out_proj.weight', 'transformer_layers.1.layers.2.attentions.0.attn.out_proj.bias', 'transformer_layers.1.layers.3.attentions.0.attn.in_proj_weight', 'transformer_layers.1.layers.3.attentions.0.attn.in_proj_bias', 'transformer_layers.1.layers.3.attentions.0.attn.out_proj.weight', 'transformer_layers.1.layers.3.attentions.0.attn.out_proj.bias']
classification head unexpected: ['cls_token', 'pos_embed', 'time_embed', 'patch_embed.projection.weight', 'patch_embed.projection.bias', 'transformer_layers.0.layers.0.attentions.0.norm.weight', 'transformer_layers.0.layers.0.attentions.0.norm.bias', 'transformer_layers.0.layers.0.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.0.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.0.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.0.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.0.ffns.0.norm.weight', 'transformer_layers.0.layers.0.ffns.0.norm.bias', 'transformer_layers.0.layers.0.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.0.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.0.ffns.0.layers.1.weight', 'transformer_layers.0.layers.0.ffns.0.layers.1.bias', 'transformer_layers.0.layers.1.attentions.0.norm.weight', 'transformer_layers.0.layers.1.attentions.0.norm.bias', 'transformer_layers.0.layers.1.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.1.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.1.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.1.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.1.ffns.0.norm.weight', 'transformer_layers.0.layers.1.ffns.0.norm.bias', 'transformer_layers.0.layers.1.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.1.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.1.ffns.0.layers.1.weight', 'transformer_layers.0.layers.1.ffns.0.layers.1.bias', 'transformer_layers.0.layers.2.attentions.0.norm.weight', 'transformer_layers.0.layers.2.attentions.0.norm.bias', 'transformer_layers.0.layers.2.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.2.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.2.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.2.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.2.ffns.0.norm.weight', 'transformer_layers.0.layers.2.ffns.0.norm.bias', 'transformer_layers.0.layers.2.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.2.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.2.ffns.0.layers.1.weight', 'transformer_layers.0.layers.2.ffns.0.layers.1.bias', 'transformer_layers.0.layers.3.attentions.0.norm.weight', 'transformer_layers.0.layers.3.attentions.0.norm.bias', 'transformer_layers.0.layers.3.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.3.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.3.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.3.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.3.ffns.0.norm.weight', 'transformer_layers.0.layers.3.ffns.0.norm.bias', 'transformer_layers.0.layers.3.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.3.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.3.ffns.0.layers.1.weight', 'transformer_layers.0.layers.3.ffns.0.layers.1.bias', 'transformer_layers.0.layers.4.attentions.0.norm.weight', 'transformer_layers.0.layers.4.attentions.0.norm.bias', 'transformer_layers.0.layers.4.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.4.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.4.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.4.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.4.ffns.0.norm.weight', 'transformer_layers.0.layers.4.ffns.0.norm.bias', 'transformer_layers.0.layers.4.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.4.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.4.ffns.0.layers.1.weight', 'transformer_layers.0.layers.4.ffns.0.layers.1.bias', 'transformer_layers.0.layers.5.attentions.0.norm.weight', 'transformer_layers.0.layers.5.attentions.0.norm.bias', 'transformer_layers.0.layers.5.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.5.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.5.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.5.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.5.ffns.0.norm.weight', 'transformer_layers.0.layers.5.ffns.0.norm.bias', 'transformer_layers.0.layers.5.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.5.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.5.ffns.0.layers.1.weight', 'transformer_layers.0.layers.5.ffns.0.layers.1.bias', 'transformer_layers.0.layers.6.attentions.0.norm.weight', 'transformer_layers.0.layers.6.attentions.0.norm.bias', 'transformer_layers.0.layers.6.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.6.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.6.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.6.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.6.ffns.0.norm.weight', 'transformer_layers.0.layers.6.ffns.0.norm.bias', 'transformer_layers.0.layers.6.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.6.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.6.ffns.0.layers.1.weight', 'transformer_layers.0.layers.6.ffns.0.layers.1.bias', 'transformer_layers.0.layers.7.attentions.0.norm.weight', 'transformer_layers.0.layers.7.attentions.0.norm.bias', 'transformer_layers.0.layers.7.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.7.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.7.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.7.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.7.ffns.0.norm.weight', 'transformer_layers.0.layers.7.ffns.0.norm.bias', 'transformer_layers.0.layers.7.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.7.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.7.ffns.0.layers.1.weight', 'transformer_layers.0.layers.7.ffns.0.layers.1.bias', 'transformer_layers.0.layers.8.attentions.0.norm.weight', 'transformer_layers.0.layers.8.attentions.0.norm.bias', 'transformer_layers.0.layers.8.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.8.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.8.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.8.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.8.ffns.0.norm.weight', 'transformer_layers.0.layers.8.ffns.0.norm.bias', 'transformer_layers.0.layers.8.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.8.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.8.ffns.0.layers.1.weight', 'transformer_layers.0.layers.8.ffns.0.layers.1.bias', 'transformer_layers.0.layers.9.attentions.0.norm.weight', 'transformer_layers.0.layers.9.attentions.0.norm.bias', 'transformer_layers.0.layers.9.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.9.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.9.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.9.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.9.ffns.0.norm.weight', 'transformer_layers.0.layers.9.ffns.0.norm.bias', 'transformer_layers.0.layers.9.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.9.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.9.ffns.0.layers.1.weight', 'transformer_layers.0.layers.9.ffns.0.layers.1.bias', 'transformer_layers.0.layers.10.attentions.0.norm.weight', 'transformer_layers.0.layers.10.attentions.0.norm.bias', 'transformer_layers.0.layers.10.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.10.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.10.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.10.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.10.ffns.0.norm.weight', 'transformer_layers.0.layers.10.ffns.0.norm.bias', 'transformer_layers.0.layers.10.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.10.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.10.ffns.0.layers.1.weight', 'transformer_layers.0.layers.10.ffns.0.layers.1.bias', 'transformer_layers.0.layers.11.attentions.0.norm.weight', 'transformer_layers.0.layers.11.attentions.0.norm.bias', 'transformer_layers.0.layers.11.attentions.0.attn.in_proj_weight', 'transformer_layers.0.layers.11.attentions.0.attn.in_proj_bias', 'transformer_layers.0.layers.11.attentions.0.attn.out_proj.weight', 'transformer_layers.0.layers.11.attentions.0.attn.out_proj.bias', 'transformer_layers.0.layers.11.ffns.0.norm.weight', 'transformer_layers.0.layers.11.ffns.0.norm.bias', 'transformer_layers.0.layers.11.ffns.0.layers.0.0.weight', 'transformer_layers.0.layers.11.ffns.0.layers.0.0.bias', 'transformer_layers.0.layers.11.ffns.0.layers.1.weight', 'transformer_layers.0.layers.11.ffns.0.layers.1.bias', 'transformer_layers.1.layers.0.attentions.0.norm.weight', 'transformer_layers.1.layers.0.attentions.0.norm.bias', 'transformer_layers.1.layers.0.attentions.0.attn.in_proj_weight', 'transformer_layers.1.layers.0.attentions.0.attn.in_proj_bias', 'transformer_layers.1.layers.0.attentions.0.attn.out_proj.weight', 'transformer_layers.1.layers.0.attentions.0.attn.out_proj.bias', 'transformer_layers.1.layers.0.ffns.0.norm.weight', 'transformer_layers.1.layers.0.ffns.0.norm.bias', 'transformer_layers.1.layers.0.ffns.0.layers.0.0.weight', 'transformer_layers.1.layers.0.ffns.0.layers.0.0.bias', 'transformer_layers.1.layers.0.ffns.0.layers.1.weight', 'transformer_layers.1.layers.0.ffns.0.layers.1.bias', 'transformer_layers.1.layers.1.attentions.0.norm.weight', 'transformer_layers.1.layers.1.attentions.0.norm.bias', 'transformer_layers.1.layers.1.attentions.0.attn.in_proj_weight', 'transformer_layers.1.layers.1.attentions.0.attn.in_proj_bias', 'transformer_layers.1.layers.1.attentions.0.attn.out_proj.weight', 'transformer_layers.1.layers.1.attentions.0.attn.out_proj.bias', 'transformer_layers.1.layers.1.ffns.0.norm.weight', 'transformer_layers.1.layers.1.ffns.0.norm.bias', 'transformer_layers.1.layers.1.ffns.0.layers.0.0.weight', 'transformer_layers.1.layers.1.ffns.0.layers.0.0.bias', 'transformer_layers.1.layers.1.ffns.0.layers.1.weight', 'transformer_layers.1.layers.1.ffns.0.layers.1.bias', 'transformer_layers.1.layers.2.attentions.0.norm.weight', 'transformer_layers.1.layers.2.attentions.0.norm.bias', 'transformer_layers.1.layers.2.attentions.0.attn.in_proj_weight', 'transformer_layers.1.layers.2.attentions.0.attn.in_proj_bias', 'transformer_layers.1.layers.2.attentions.0.attn.out_proj.weight', 'transformer_layers.1.layers.2.attentions.0.attn.out_proj.bias', 'transformer_layers.1.layers.2.ffns.0.norm.weight', 'transformer_layers.1.layers.2.ffns.0.norm.bias', 'transformer_layers.1.layers.2.ffns.0.layers.0.0.weight', 'transformer_layers.1.layers.2.ffns.0.layers.0.0.bias', 'transformer_layers.1.layers.2.ffns.0.layers.1.weight', 'transformer_layers.1.layers.2.ffns.0.layers.1.bias', 'transformer_layers.1.layers.3.attentions.0.norm.weight', 'transformer_layers.1.layers.3.attentions.0.norm.bias', 'transformer_layers.1.layers.3.attentions.0.attn.in_proj_weight', 'transformer_layers.1.layers.3.attentions.0.attn.in_proj_bias', 'transformer_layers.1.layers.3.attentions.0.attn.out_proj.weight', 'transformer_layers.1.layers.3.attentions.0.attn.out_proj.bias', 'transformer_layers.1.layers.3.ffns.0.norm.weight', 'transformer_layers.1.layers.3.ffns.0.norm.bias', 'transformer_layers.1.layers.3.ffns.0.layers.0.0.weight', 'transformer_layers.1.layers.3.ffns.0.layers.0.0.bias', 'transformer_layers.1.layers.3.ffns.0.layers.1.weight', 'transformer_layers.1.layers.3.ffns.0.layers.1.bias', 'norm.weight', 'norm.bias']