感谢贵公司的开源!
我在gpu
上运行generate.py
时报以下错误。
2022-01-18 20:50:55.799119: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfuld dynamic library libcublas.so.10.0
2022-01-18 20:52:03.633057: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2022-01-18 20:52:03.633118: E tensorflow/stream_executor/cuda/cuda_blas.cc:2301] Internal: failed BLAS call, see log for details
File "/usr/local/miniconda3/lib/python3.6/site-packages/bert4keras/snippets.py", line 627, in random_sample
inputs, output_ids, states, temperature, 'probas'
File "/usr/local/miniconda3/lib/python3.6/site-packages/bert4keras/snippets.py", line 525, in new_predict
prediction = predict(self, inputs, output_ids, states)
File "example_generate.py", line 52, in predict
return self.last_token(seq2seq).predict([token_ids, segment_ids])
File "/usr/local/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/usr/local/miniconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 324, in predict_loop
batch_outs = f(ins_batch)
File "/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in call
run_metadata=self.run_metadata)
File "/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas xGEMMBatched launch failed : a.shape=[12,11,64], b.shape=[12,64,11], m=11, n=11, k=64, batch_size=12
[[{{node Transformer-0-MultiHeadSelfAttention/einsum/MatMul}}]]
[[lambda_1/strided_slice/_1229]]
(1) Internal: Blas xGEMMBatched launch failed : a.shape=[12,11,64], b.shape=[12,64,11], m=11, n=11, k=64, batch_size=12
[[{{node Transformer-0-MultiHeadSelfAttention/einsum/MatMul}}]]
0 successful operations.
0 derived errors ignored.
google了一下,说是需要添加如下代码:
# 第一种情况
cfg = tf.ConfigProto()
cfg.gpu_options.allow_growth=True
cfg.gpu_options.per_process_gpu_memory_fraction = 0.90
sess = tf.Session(config=cfg)
# 第二种情况
import keras
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # TensorFlow按需分配显存
config.gpu_options.per_process_gpu_memory_fraction = 0.5 # 指定显存分配比例
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))
我发现我无论是第一种情况还是第二种情况都会报图片中的错误。我用的显卡是A100-SXM4-40GB,安装的是
Package Version
absl-py 0.15.0
asn1crypto 0.24.0
astor 0.8.1
astunparse 1.6.3
bert4keras 0.10.6
cached-property 1.5.2
cachetools 4.2.4
certifi 2018.4.16
cffi 1.11.5
chardet 3.0.4
charset-normalizer 2.0.10
clang 5.0
conda 4.5.4
cryptography 2.2.2
flatbuffers 1.12
gast 0.4.0
google-auth 1.35.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.43.0
h5py 3.1.0
idna 2.6
importlib-metadata 1.6.0
Keras 2.3.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
Markdown 3.2.2
numpy 1.19.5
oauthlib 3.1.1
opt-einsum 3.3.0
pip 20.1
protobuf 3.12.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycosat 0.6.3
pycparser 2.18
pyOpenSSL 18.0.0
PySocks 1.6.8
PyYAML 6.0
requests 2.27.1
requests-oauthlib 1.3.0
rsa 4.8
ruamel-yaml 0.15.37
scipy 1.5.4
setuptools 46.4.0
six 1.15.0
tensorboard 1.14.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
termcolor 1.1.0
typing-extensions 3.7.4.3
urllib3 1.22
Werkzeug 1.0.1
wheel 0.37.1
wrapt 1.12.1
zipp 3.1.0
我用如下代码测试我的gpu,发现是可以用的。
# 测试gpu是否可以用
import tensorflow as tf
tf.test.is_gpu_available()
请指导一下,如何可以用GPU运行generate.py
。
感谢万分,预祝新年快乐~