this is my run bash,new_gpu_squad_bash.sh(Modified from:gpu_squad_base.sh)
`
local path
SQUAD_DIR=../SQUAD
INIT_CKPT_DIR=../xlnet_cased_L-12_H-768_A-12
PROC_DATA_DIR=proc_data/squad
MODEL_DIR=experiment/squad_new_gpu
Use 3 GPUs, each with 8 seqlen-512 samples
python ../run_squad.py
--use_tpu=False
--num_hosts=1
--num_core_per_host=1
--model_config_path=${INIT_CKPT_DIR}/xlnet_config.json
--spiece_model_file=${INIT_CKPT_DIR}/spiece.model
--output_dir=${PROC_DATA_DIR}
--init_checkpoint=${INIT_CKPT_DIR}/xlnet_model.ckpt
--model_dir=${MODEL_DIR}
--train_file=${SQUAD_DIR}/small_train-v2.0.json
--predict_file=${SQUAD_DIR}/dev-v2.0.json
--uncased=False
--max_seq_length=512
--do_train=True
--train_batch_size=1
--do_predict=True
--predict_batch_size=1
--learning_rate=2e-5
--adam_epsilon=1e-6
--iterations=1000
--save_steps=1000
--train_steps=12000
--warmup_steps=1000
$@
bash run command :CUDA_VISIBLE_DEVICES=0 bash new_gpu_squad_bash.sh GPU space should enough. ![2022-08-18 15-03-03屏幕截图](https://user-images.githubusercontent.com/59367257/185326924-a7572b68-c36f-4096-aaa6-3f2be35fbb26.png) However, the displayed program reports an error.
(tensorflow_gpu_1_13) zaisen_ye@ubuntu-DeepLearning-2602056:/data/zaisen_ye/xlnet-master/scripts$ CUDA_VISIBLE_DEVICES=0 bash new_gpu_squad_bash.sh
2022-08-18 15:00:46.353987: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-08-18 15:00:47.049036: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557600f5b2d0 executing computations on platform CUDA. Devices:
2022-08-18 15:00:47.049082: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): NVIDIA GeForce RTX 3080 Ti, Compute Capability 8.6
2022-08-18 15:00:47.051712: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2022-08-18 15:00:47.054733: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557600fd8c20 executing computations on platform Host. Devices:
2022-08-18 15:00:47.054752: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2022-08-18 15:00:47.054866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA GeForce RTX 3080 Ti major: 8 minor: 6 memoryClockRate(GHz): 1.665
pciBusID: 0000:4f:00.0
totalMemory: 11.77GiB freeMemory: 11.53GiB
2022-08-18 15:00:47.054879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2022-08-18 15:00:47.057695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-08-18 15:00:47.057725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2022-08-18 15:00:47.057734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2022-08-18 15:00:47.057839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11215 MB memory) -> physical GPU (device:
0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:4f:00.0, compute capability: 8.6)
INFO:tensorflow:Single device mode.
WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
- https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
- https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa1087c9
210>, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=1, num_cores_per_replica=None, per_host_input_for_tra
ining=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_tf_random_seed': None, '_device_fn': None, '_cluster': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step
_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_session_config': allow_soft_placement: true
, '_global_id_in_cluster': 0, '_is_chief': True, '_protocol': None, '_save_checkpoints_steps': 1000, '_experimental_distribute': None, '_save_summary_steps': 100, '_model_dir': 'experiment/squad_new_gpu'
, '_master': ''}
WARNING:tensorflow:Estimator's model_fn (<function model_fn at 0x7f9fa03be3d0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Input tfrecord file glob proc_data/squad/spiece.model..slen-512.qlen-64.train.tf_record
INFO:tensorflow:Find 1 input paths ['proc_data/squad/spiece.model.0.slen-512.qlen-64.train.tf_record']
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops)
is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From ../run_squad.py:1019: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.experimental.map_and_batch(...)
.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:memory input None
INFO:tensorflow:Use float type <dtype: 'float32'>
WARNING:tensorflow:From /data/zaisen_ye/xlnet-master/modeling.py:534: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with ke
ep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate
instead of keep_prob
. Rate should be set to rate = 1 - keep_prob
.
WARNING:tensorflow:From /data/zaisen_ye/xlnet-master/modeling.py:67: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
INFO:tensorflow:#params: 119082242
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/learning_rate_decay_v2.py:321: div (from tensorflow.python.ops.math_ops) is
deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated an
d will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/LayerNorm/gamma:0
INFO:tensorflow:Initialize from the ckpt ../xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt
INFO:tensorflow:*** Global Variables ****
INFO:tensorflow: name = model/transformer/r_w_bias:0, shape = (12, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/r_r_bias:0, shape = (12, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/r_s_bias:0, shape = (12, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/seg_embed:0, shape = (12, 2, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = start_logits/dense/kernel:0, shape = (768, 1)
INFO:tensorflow: name = start_logits/dense/bias:0, shape = (1,)
INFO:tensorflow: name = end_logits/dense_0/kernel:0, shape = (1536, 768)
INFO:tensorflow: name = end_logits/dense_0/bias:0, shape = (768,)
INFO:tensorflow: name = end_logits/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = end_logits/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = end_logits/dense_1/kernel:0, shape = (768, 1)
INFO:tensorflow: name = end_logits/dense_1/bias:0, shape = (1,)
INFO:tensorflow: name = answer_class/dense_0/kernel:0, shape = (1536, 768)
INFO:tensorflow: name = answer_class/dense_0/bias:0, shape = (768,)
INFO:tensorflow: name = answer_class/dense_1/kernel:0, shape = (768, 1)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2022-08-18 15:01:06.954873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2022-08-18 15:01:06.954938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-08-18 15:01:06.954945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2022-08-18 15:01:06.954951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2022-08-18 15:01:06.955044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11215 MB memory) -> physical GPU (device:
0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:4f:00.0, compute capability: 8.6)
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint
_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from experiment/squad_new_gpu/model.ckpt-0
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkp
oint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into experiment/squad_new_gpu/model.ckpt.
2022-08-18 15:05:02.150154: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2022-08-18 15:05:17.835563: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_EXECUTION_FAILED
2022-08-18 15:05:17.835625: E tensorflow/stream_executor/cuda/cuda_blas.cc:2620] Internal: failed BLAS call, see log for details
2022-08-18 15:05:17.835644: I tensorflow/stream_executor/stream.cc:5027] [stream=0x557603b483d0,impl=0x557603b391d0] did not memcpy host-to-device; source: 0x7f9cb801b970
2022-08-18 15:05:17.835677: E tensorflow/stream_executor/cuda/cuda_blas.cc:2620] Internal: failed to copy memory from host to device in CUDABlas::DoBlasGemmBatched
Traceback (most recent call last):
File "../run_squad.py", line 1317, in
tf.app.run()
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "../run_squad.py", line 1216, in main
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.train_steps)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model_default
saving_listeners)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1407, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 676, in run
run_metadata=run_metadata)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1171, in run
run_metadata=run_metadata)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1270, in run
raise six.reraise(*original_exc_info)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
return self._sess.run(*args, **kwargs)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1327, in run
run_metadata=run_metadata)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1091, in run
return self._sess.run(*args, **kwargs)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[12,512,64], b.shape=[12,64,512], m=512, n=512, k=64, batch_size=12
[[node model/transformer/layer_0/rel_attn/einsum_4/MatMul (defined at /data/zaisen_ye/xlnet-master/modeling.py:133) ]]
[[node add_1 (defined at ../run_squad.py:1088) ]]
Caused by op u'model/transformer/layer_0/rel_attn/einsum_4/MatMul', defined at:
File "../run_squad.py", line 1317, in
tf.app.run()
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "../run_squad.py", line 1216, in main
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.train_steps)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "../run_squad.py", line 1033, in model_fn
outputs = function_builder.get_qa_outputs(FLAGS, features, is_training)
File "/data/zaisen_ye/xlnet-master/function_builder.py", line 230, in get_qa_outputs
input_mask=inp_mask)
File "/data/zaisen_ye/xlnet-master/xlnet.py", line 222, in init
) = modeling.transformer_xl(**tfm_args)
File "/data/zaisen_ye/xlnet-master/modeling.py", line 628, in transformer_xl
reuse=reuse)
File "/data/zaisen_ye/xlnet-master/modeling.py", line 309, in rel_multihead_attn
r_r_bias, r_s_bias, attn_mask, dropatt, is_training, scale)
File "/data/zaisen_ye/xlnet-master/modeling.py", line 133, in rel_attn_core
ac = tf.einsum('ibnd,jbnd->ijbn', q_head + r_w_bias, k_head_h)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/ops/special_math_ops.py", line 262, in einsum
axes_to_sum)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/ops/special_math_ops.py", line 394, in _einsum_reduction
product = math_ops.matmul(t0, t1)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 2417, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1423, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()
InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[12,512,64], b.shape=[12,64,512], m=512, n=512, k=64, batch_size=12
[[node model/transformer/layer_0/rel_attn/einsum_4/MatMul (defined at /data/zaisen_ye/xlnet-master/modeling.py:133) ]]
[[node add_1 (defined at ../run_squad.py:1088) ]]
`
The packages used in the program are as follows:(tensorflow-1.13.1,sentencepiece-0.1.91,cudatoolkit-10.0.130,cudnn-7.3.1)
(tensorflow_gpu_1_13) zaisen_ye@ubuntu-DeepLearning-2602056:/data/zaisen_ye/xlnet-master/scripts$ conda list
packages in environment at /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13:
Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
_tflow_select 2.1.0 gpu
absl-py 0.15.0 pyhd3eb1b0_0
astor 0.8.1 pypi_0 pypi
backports 1.1 pyhd3eb1b0_0
backports-weakref 1.0.post1 pypi_0 pypi
backports.weakref 1.0.post1 py_1
blas 1.0 mkl
c-ares 1.18.1 h7f8727e_0
ca-certificates 2022.07.19 h06a4308_0
certifi 2020.6.20 pyhd3eb1b0_3
cudatoolkit 10.0.130 0
cudnn 7.3.1 cuda10.0_0
cupti 10.0.130 0
enum34 1.1.10 pypi_0 pypi
funcsigs 1.0.2 pypi_0 pypi
futures 3.3.0 py27_0
gast 0.5.3 pyhd3eb1b0_0
grpcio 1.41.1 pypi_0 pypi
h5py 2.10.0 pypi_0 pypi
hdf5 1.10.4 hb1b8bf9_0
intel-openmp 2022.0.1 h06a4308_3633
keras-applications 1.0.8 py_1
keras-preprocessing 1.1.2 pypi_0 pypi
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 11.2.0 h1234567_1
libprotobuf 3.11.2 hd408876_0
libstdcxx-ng 11.2.0 h1234567_1
linecache2 1.0.0 py_1
markdown 3.1.1 py27_0
mkl 2020.2 256
mkl-service 2.3.0 py27he904b0f_0
mkl_fft 1.0.15 py27ha843d7b_0
mkl_random 1.1.0 py27hd6b4f25_0
mock 3.0.5 py27_0
ncurses 6.3 h5eee18b_3
numpy 1.16.6 pypi_0 pypi
numpy-base 1.16.6 py27hde5b4d6_0
openssl 1.1.1q h7f8727e_0
pip 19.3.1 py27_0
protobuf 3.17.3 pypi_0 pypi
python 2.7.18 ha1903f6_2
readline 8.1.2 h7f8727e_1
scipy 1.2.1 py27h7c811a0_0
sentencepiece 0.1.91 pypi_0 pypi
setuptools 44.0.0 py27_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.39.2 h5082296_0
tensorboard 1.13.1 py27hf484d3e_0
tensorflow 1.13.1 gpu_py27hcb41dfa_0
tensorflow-estimator 1.13.0 py_0
tensorflow-gpu 1.13.1 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
traceback2 1.4.0 py27_0
unittest2 1.1.0 py27_0
werkzeug 1.0.1 pyhd3eb1b0_0
wheel 0.37.1 pyhd3eb1b0_0
zlib 1.2.12 h7f8727e_2
PS:I know my problem is similar as in this question: https://stackoverflow.com/questions/43990046/tensorflow-blas-gemm-launch-failed,but it has not been solved there and I'm not sure this question is clear enough or is exactly the same problem as I have so I'm posting it with my own error message. I thought this problem is different of:(https://stackoverflow.com/questions/50911052/tensorflow-matmul-blas-xgemmbatched-launch-failed)..
@zihangdai