Script throws the following warning when running multi_graph training.
Some weights of BartForConditionalGeneration were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['model.decoder.layers.0.resweight', 'model.decoder.layers.0.resweight_2', 'model.decoder.layers.0.discourse_attn.k_proj.weight', 'model.decoder.layers.0.discourse_attn.k_proj.bias', 'model.decoder.layers.0.discourse_attn.v_proj.weight', 'model.decoder.layers.0.discourse_attn.v_proj.bias', 'model.decoder.layers.0.discourse_attn.q_proj.weight', 'model.decoder.layers.0.discourse_attn.q_proj.bias', 'model.decoder.layers.0.discourse_attn.out_proj.weight', 'model.decoder.layers.0.discourse_attn.out_proj.bias', 'model.decoder.layers.0.discourse_attn_layer_norm.weight', 'model.decoder.layers.0.discourse_attn_layer_norm.bias', 'model.decoder.layers.0.action_attn.k_proj.weight', 'model.decoder.layers.0.action_attn.k_proj.bias', 'model.decoder.layers.0.action_attn.v_proj.weight', 'model.decoder.layers.0.action_attn.v_proj.bias', 'model.decoder.layers.0.action_attn.q_proj.weight', 'model.decoder.layers.0.action_attn.q_proj.bias', 'model.decoder.layers.0.action_attn.out_proj.weight', 'model.decoder.layers.0.action_attn.out_proj.bias', 'model.decoder.layers.0.action_attn_layer_norm.weight', 'model.decoder.layers.0.action_attn_layer_norm.bias', 'model.decoder.layers.0.composit_layer.weight', 'model.decoder.layers.0.composit_layer.bias', 'model.decoder.layers.0.composit_layer_norm.weight', 'model.decoder.layers.0.composit_layer_norm.bias', 'model.decoder.layers.1.resweight', 'model.decoder.layers.1.resweight_2', 'model.decoder.layers.1.discourse_attn.k_proj.weight', 'model.decoder.layers.1.discourse_attn.k_proj.bias', 'model.decoder.layers.1.discourse_attn.v_proj.weight', 'model.decoder.layers.1.discourse_attn.v_proj.bias', 'model.decoder.layers.1.discourse_attn.q_proj.weight', 'model.decoder.layers.1.discourse_attn.q_proj.bias', 'model.decoder.layers.1.discourse_attn.out_proj.weight', 'model.decoder.layers.1.discourse_attn.out_proj.bias', 'model.decoder.layers.1.discourse_attn_layer_norm.weight', 'model.decoder.layers.1.discourse_attn_layer_norm.bias', 'model.decoder.layers.1.action_attn.k_proj.weight', 'model.decoder.layers.1.action_attn.k_proj.bias', 'model.decoder.layers.1.action_attn.v_proj.weight', 'model.decoder.layers.1.action_attn.v_proj.bias', 'model.decoder.layers.1.action_attn.q_proj.weight', 'model.decoder.layers.1.action_attn.q_proj.bias', 'model.decoder.layers.1.action_attn.out_proj.weight', 'model.decoder.layers.1.action_attn.out_proj.bias', 'model.decoder.layers.1.action_attn_layer_norm.weight', 'model.decoder.layers.1.action_attn_layer_norm.bias', 'model.decoder.layers.1.composit_layer.weight', 'model.decoder.layers.1.composit_layer.bias', 'model.decoder.layers.1.composit_layer_norm.weight', 'model.decoder.layers.1.composit_layer_norm.bias', 'model.decoder.layers.2.resweight', 'model.decoder.layers.2.resweight_2', 'model.decoder.layers.2.discourse_attn.k_proj.weight', 'model.decoder.layers.2.discourse_attn.k_proj.bias', 'model.decoder.layers.2.discourse_attn.v_proj.weight', 'model.decoder.layers.2.discourse_attn.v_proj.bias', 'model.decoder.layers.2.discourse_attn.q_proj.weight', 'model.decoder.layers.2.discourse_attn.q_proj.bias', 'model.decoder.layers.2.discourse_attn.out_proj.weight', 'model.decoder.layers.2.discourse_attn.out_proj.bias', 'model.decoder.layers.2.discourse_attn_layer_norm.weight', 'model.decoder.layers.2.discourse_attn_layer_norm.bias', 'model.decoder.layers.2.action_attn.k_proj.weight', 'model.decoder.layers.2.action_attn.k_proj.bias', 'model.decoder.layers.2.action_attn.v_proj.weight', 'model.decoder.layers.2.action_attn.v_proj.bias', 'model.decoder.layers.2.action_attn.q_proj.weight', 'model.decoder.layers.2.action_attn.q_proj.bias', 'model.decoder.layers.2.action_attn.out_proj.weight', 'model.decoder.layers.2.action_attn.out_proj.bias', 'model.decoder.layers.2.action_attn_layer_norm.weight', 'model.decoder.layers.2.action_attn_layer_norm.bias', 'model.decoder.layers.2.composit_layer.weight', 'model.decoder.layers.2.composit_layer.bias', 'model.decoder.layers.2.composit_layer_norm.weight', 'model.decoder.layers.2.composit_layer_norm.bias', 'model.decoder.layers.3.resweight', 'model.decoder.layers.3.resweight_2', 'model.decoder.layers.3.discourse_attn.k_proj.weight', 'model.decoder.layers.3.discourse_attn.k_proj.bias', 'model.decoder.layers.3.discourse_attn.v_proj.weight', 'model.decoder.layers.3.discourse_attn.v_proj.bias', 'model.decoder.layers.3.discourse_attn.q_proj.weight', 'model.decoder.layers.3.discourse_attn.q_proj.bias', 'model.decoder.layers.3.discourse_attn.out_proj.weight', 'model.decoder.layers.3.discourse_attn.out_proj.bias', 'model.decoder.layers.3.discourse_attn_layer_norm.weight', 'model.decoder.layers.3.discourse_attn_layer_norm.bias', 'model.decoder.layers.3.action_attn.k_proj.weight', 'model.decoder.layers.3.action_attn.k_proj.bias', 'model.decoder.layers.3.action_attn.v_proj.weight', 'model.decoder.layers.3.action_attn.v_proj.bias', 'model.decoder.layers.3.action_attn.q_proj.weight', 'model.decoder.layers.3.action_attn.q_proj.bias', 'model.decoder.layers.3.action_attn.out_proj.weight', 'model.decoder.layers.3.action_attn.out_proj.bias', 'model.decoder.layers.3.action_attn_layer_norm.weight', 'model.decoder.layers.3.action_attn_layer_norm.bias', 'model.decoder.layers.3.composit_layer.weight', 'model.decoder.layers.3.composit_layer.bias', 'model.decoder.layers.3.composit_layer_norm.weight', 'model.decoder.layers.3.composit_layer_norm.bias', 'model.decoder.layers.4.resweight', 'model.decoder.layers.4.resweight_2', 'model.decoder.layers.4.discourse_attn.k_proj.weight', 'model.decoder.layers.4.discourse_attn.k_proj.bias', 'model.decoder.layers.4.discourse_attn.v_proj.weight', 'model.decoder.layers.4.discourse_attn.v_proj.bias', 'model.decoder.layers.4.discourse_attn.q_proj.weight', 'model.decoder.layers.4.discourse_attn.q_proj.bias', 'model.decoder.layers.4.discourse_attn.out_proj.weight', 'model.decoder.layers.4.discourse_attn.out_proj.bias', 'model.decoder.layers.4.discourse_attn_layer_norm.weight', 'model.decoder.layers.4.discourse_attn_layer_norm.bias', 'model.decoder.layers.4.action_attn.k_proj.weight', 'model.decoder.layers.4.action_attn.k_proj.bias', 'model.decoder.layers.4.action_attn.v_proj.weight', 'model.decoder.layers.4.action_attn.v_proj.bias', 'model.decoder.layers.4.action_attn.q_proj.weight', 'model.decoder.layers.4.action_attn.q_proj.bias', 'model.decoder.layers.4.action_attn.out_proj.weight', 'model.decoder.layers.4.action_attn.out_proj.bias', 'model.decoder.layers.4.action_attn_layer_norm.weight', 'model.decoder.layers.4.action_attn_layer_norm.bias', 'model.decoder.layers.4.composit_layer.weight', 'model.decoder.layers.4.composit_layer.bias', 'model.decoder.layers.4.composit_layer_norm.weight', 'model.decoder.layers.4.composit_layer_norm.bias', 'model.decoder.layers.5.resweight', 'model.decoder.layers.5.resweight_2', 'model.decoder.layers.5.discourse_attn.k_proj.weight', 'model.decoder.layers.5.discourse_attn.k_proj.bias', 'model.decoder.layers.5.discourse_attn.v_proj.weight', 'model.decoder.layers.5.discourse_attn.v_proj.bias', 'model.decoder.layers.5.discourse_attn.q_proj.weight', 'model.decoder.layers.5.discourse_attn.q_proj.bias', 'model.decoder.layers.5.discourse_attn.out_proj.weight', 'model.decoder.layers.5.discourse_attn.out_proj.bias', 'model.decoder.layers.5.discourse_attn_layer_norm.weight', 'model.decoder.layers.5.discourse_attn_layer_norm.bias', 'model.decoder.layers.5.action_attn.k_proj.weight', 'model.decoder.layers.5.action_attn.k_proj.bias', 'model.decoder.layers.5.action_attn.v_proj.weight', 'model.decoder.layers.5.action_attn.v_proj.bias', 'model.decoder.layers.5.action_attn.q_proj.weight', 'model.decoder.layers.5.action_attn.q_proj.bias', 'model.decoder.layers.5.action_attn.out_proj.weight', 'model.decoder.layers.5.action_attn.out_proj.bias', 'model.decoder.layers.5.action_attn_layer_norm.weight', 'model.decoder.layers.5.action_attn_layer_norm.bias', 'model.decoder.layers.5.composit_layer.weight', 'model.decoder.layers.5.composit_layer.bias', 'model.decoder.layers.5.composit_layer_norm.weight', 'model.decoder.layers.5.composit_layer_norm.bias', 'model.discourse_encoder.attention_0.W', 'model.discourse_encoder.attention_0.a', 'model.discourse_encoder.attention_0.one_hot_embedding.weight', 'model.discourse_encoder.attention_0.layer_norm.weight', 'model.discourse_encoder.attention_0.layer_norm.bias', 'model.discourse_encoder.attention_1.W', 'model.discourse_encoder.attention_1.a', 'model.discourse_encoder.attention_1.one_hot_embedding.weight', 'model.discourse_encoder.attention_1.layer_norm.weight', 'model.discourse_encoder.attention_1.layer_norm.bias', 'model.discourse_encoder.out_att.W', 'model.discourse_encoder.out_att.a', 'model.discourse_encoder.out_att.one_hot_embedding.weight', 'model.discourse_encoder.out_att.layer_norm.weight', 'model.discourse_encoder.out_att.layer_norm.bias', 'model.discourse_encoder.fc.weight', 'model.discourse_encoder.fc.bias', 'model.discourse_encoder.layer_norm.weight', 'model.discourse_encoder.layer_norm.bias', 'model.action_encoder.attention_0.W', 'model.action_encoder.attention_0.a', 'model.action_encoder.attention_0.layer_norm.weight', 'model.action_encoder.attention_0.layer_norm.bias', 'model.action_encoder.attention_1.W', 'model.action_encoder.attention_1.a', 'model.action_encoder.attention_1.layer_norm.weight', 'model.action_encoder.attention_1.layer_norm.bias', 'model.action_encoder.out_att.W', 'model.action_encoder.out_att.a', 'model.action_encoder.out_att.layer_norm.weight', 'model.action_encoder.out_att.layer_norm.bias', 'model.action_encoder.fc.weight', 'model.action_encoder.fc.bias', 'model.action_encoder.layer_norm.weight', 'model.action_encoder.layer_norm.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Is this expected?