Hi,
can you help me and tell me which rlbench and yarr versions/tags are compatible with each other?
For most of the problems I believe that pytorch is the issue and I don't find in any requirements.txt which one you use to make things work.
I observe this error
Process train_env0:
Traceback (most recent call last):
File "/home/user/anaconda3/envs/ARM/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/user/anaconda3/envs/ARM/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/runners/_env_runner.py", line 169, in _run_env
raise e
File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/runners/_env_runner.py", line 143, in _run_env
for replay_transition in generator:
File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/utils/rollout_generator.py", line 35, in generator
transition = env.step(act_result)
File "/home/user/ARM/arm/custom_rlbench_env.py", line 128, in step
obs, reward, terminal = self._task.step(action)
File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/rlbench/task_environment.py", line 99, in step
self._action_mode.action(self._scene, action)
File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/rlbench/action_modes/action_mode.py", line 32, in action
arm_action = np.array(action[:arm_act_size])
File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
[2022-05-27 10:10:31,983][root][ERROR] - Env train_env0 failed too many times (11 times > 10)
Exception in thread EnvRunnerThread:
Traceback (most recent call last):
File "/home/user/anaconda3/envs/ARM/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/home/user/anaconda3/envs/ARM/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/runners/env_runner.py", line 134, in _run
raise RuntimeError('Too many process failures.')
RuntimeError: Too many process failures.
I pulled the current version of RLbench and YARR and did a re-install of all packages in a new conda environment.
I am wondering if you use a different torch version that can handle tensor to numpy automatically better.
Currently I fixed this by adding .cpu() in a few files
YARR
git diff main
diff --git a/yarr/envs/rlbench_env.py b/yarr/envs/rlbench_env.py
index 6aad118..6460fb1 100644
--- a/yarr/envs/rlbench_env.py
+++ b/yarr/envs/rlbench_env.py
@@ -6,7 +6,7 @@ try:
except (ModuleNotFoundError, ImportError) as e:
print("You need to install RLBench: 'https://github.com/stepjam/RLBench'")
raise e
-from rlbench.action_modes import ActionMode
+from rlbench.action_modes.action_mode import ActionMode
from rlbench.backend.observation import Observation
from rlbench.backend.task import Task
diff --git a/yarr/utils/rollout_generator.py b/yarr/utils/rollout_generator.py
index d4d2973..a3f12ee 100644
--- a/yarr/utils/rollout_generator.py
+++ b/yarr/utils/rollout_generator.py
@@ -27,7 +27,7 @@ class RolloutGenerator(object):
deterministic=eval)
# Convert to np if not already
- agent_obs_elems = {k: np.array(v) for k, v in
+ agent_obs_elems = {k: np.array(v.cpu()) for k, v in
act_result.observation_elements.items()}
extra_replay_elements = {k: np.array(v) for k, v in
act_result.replay_elements.items()}
@@ -66,7 +66,7 @@ class RolloutGenerator(object):
prepped_data = {k: torch.tensor([v], device=self._env_device) for k, v in obs_history.items()}
act_result = agent.act(step_signal.value, prepped_data,
deterministic=eval)
- agent_obs_elems_tp1 = {k: np.array(v) for k, v in
+ agent_obs_elems_tp1 = {k: np.array(v.cpu()) for k, v in
act_result.observation_elements.items()}
obs_tp1.update(agent_obs_elems_tp1)
replay_transition.final_observation = obs_tp1
(Side note: Also observe that with the recent changes in folder structure in RLbench I changed the import for ActionMode.)
RLBench
git diff master
diff --git a/rlbench/action_modes/action_mode.py b/rlbench/action_modes/action_mode.py
index 68171a37..a2c264ef 100644
--- a/rlbench/action_modes/action_mode.py
+++ b/rlbench/action_modes/action_mode.py
@@ -29,8 +29,8 @@ class MoveArmThenGripper(ActionMode):
def action(self, scene: Scene, action: np.ndarray):
arm_act_size = np.prod(self.arm_action_mode.action_shape(scene))
- arm_action = np.array(action[:arm_act_size])
- ee_action = np.array(action[arm_act_size:])
+ arm_action = np.array(action[:arm_act_size].cpu())
+ ee_action = np.array(action[arm_act_size:].cpu())
self.arm_action_mode.action(scene, arm_action)
self.gripper_action_mode.action(scene, ee_action)
I believe that the error comes from a change somewhere else though, or that you use a torch version that can deal with this? Can you please help me? I don't know which pyorch version you are using. It is missing in the requirements.txt. I installed pytorch with conda.
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
An error that I am unable to fix is this one
Exception in thread EnvRunnerThread:
Traceback (most recent call last):
File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/runners/env_runner.py", line 141, in _run
new_transitions = self._update()
File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/runners/env_runner.py", line 86, in _update
self._agent_summaries = list(
File "<string>", line 2, in __getitem__
File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/multiprocessing/managers.py", line 825, in _callmethod
raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Unserializable message: Traceback (most recent call last):
File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/multiprocessing/managers.py", line 300, in serve_client
send(msg)
File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/multiprocessing/connection.py", line 211, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 249, in reduce_tensor
event_sync_required) = storage._share_cuda_()
File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/site-packages/torch/storage.py", line 623, in _share_cuda_
return self._storage._share_cuda_(*args, **kwargs)
RuntimeError: Attempted to send CUDA tensor received from another process; this is not currently supported. Consider cloning before sending.
---------------------------------------------------------------------------
[W CudaIPCTypes.cpp:92] Producer process tried to deallocate over 1000 memory blocks referred by consumer processes. Deallocation might be significantly slowed down. We assume it will never going to be the case, but if it is, please file but to https://github.com/pytorch/pytorch
Do you have advice? It seems to me like pytorch is the issue for most of the problems I mentioned.
using: Python 3.9.12