I am trying to run your code on a fresh install of Ubuntu 20.04 with Python 3.9.5, and CUDA 11.6 / cuDNN 8.3.2, but when executing main.py the following cuDNN error results:
$ python main.py
2022-01-21 16:02:17,793 INFO services.py:1272 -- View the Ray dashboard at http://127.0.0.1:8265
(pid=36888) A.L.E: Arcade Learning Environment (version +978d2ce)
(pid=36888) [Powered by Stella]
(pid=36874) A.L.E: Arcade Learning Environment (version +978d2ce)
(pid=36874) [Powered by Stella]
(pid=36881) A.L.E: Arcade Learning Environment (version +978d2ce)
(pid=36881) [Powered by Stella]
(pid=36885) A.L.E: Arcade Learning Environment (version +978d2ce)
(pid=36885) [Powered by Stella]
(pid=36882) A.L.E: Arcade Learning Environment (version +978d2ce)
(pid=36882) [Powered by Stella]
(pid=36875) A.L.E: Arcade Learning Environment (version +978d2ce)
(pid=36875) [Powered by Stella]
====================================================================================================
Traceback (most recent call last):
File "/home/nate/Desktop/Atom/agent57_pytorch/main.py", line 267, in <module>
main(parser.parse_args())
File "/home/nate/Desktop/Atom/agent57_pytorch/main.py", line 144, in main
in_q_weight, ex_q_weight, embed_weight, trained_lifelong_weight, indices, priorities, in_q_loss, ex_q_loss, embed_loss, lifelong_loss = ray.get(finished_learner[0])
File "/home/nate/miniconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 62, in wrapper
return func(*args, **kwargs)
File "/home/nate/miniconda3/lib/python3.9/site-packages/ray/worker.py", line 1495, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::Learner.update_network() (pid=36888, ip=192.168.137.71)
File "python/ray/_raylet.pyx", line 501, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 451, in ray._raylet.execute_task.function_executor
File "/home/nate/miniconda3/lib/python3.9/site-packages/ray/_private/function_manager.py", line 563, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File "/home/nate/Desktop/Atom/agent57_pytorch/learner.py", line 262, in update_network
priorities, in_q_loss, ex_q_loss = self.qnet_update(weights, segments)
File "/home/nate/Desktop/Atom/agent57_pytorch/learner.py", line 308, in qnet_update
ex_target_qvalues = self.get_qvalues(self.ex_target_q_network, ex_h0, ex_c0)
File "/home/nate/Desktop/Atom/agent57_pytorch/learner.py", line 371, in get_qvalues
_, (h, c) = q_network(self.states[t],
File "/home/nate/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nate/Desktop/Atom/agent57_pytorch/model.py", line 99, in forward
x, states = self.lstm(x.unsqueeze(0), states)
File "/home/nate/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nate/miniconda3/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 679, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Have you encountered an error like this during development? Are you using an older version of CUDA / cuDNN? Please let me know if you have any suggestions.