Attention-driven Robot Manipulation (ARM) which includes Q-attention

Stephen James

Last update: Dec 29, 2022

Related tags

Deep Learning ARM

Overview

Attention-driven Robotic Manipulation (ARM)

This codebase is home to:

Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation

Installation

ARM is trained using the YARR framework. Head to the YARR github page and follow installation instructions.

ARM is evaluated on RLBench 1.1.0. Head to the RLBench github page and follow installation instructions.

Now install project requirements:

pip install -r requirements.txt

Running experiments

Be sure to have RLBench demos saved on your machine before proceeding. To generate demos for a task, go to the tools directory in RLBench (rlbench/tools), and run:

python dataset_generator.py --save_path=/mnt/my/save/dir --tasks=take_lid_off_saucepan --image_size=128,128 \
--renderer=opengl --episodes_per_task=100 --variations=1 --processes=1

Experiments are launched via Hydra. To start training an agent to accomplish take_lid_off_saucepan with the default parameters on gpu 0, then run:

python launch.py method=ARM rlbench.task=take_lid_off_saucepan rlbench.demo_path=/mnt/my/save/dir framework.gpu=0

Comments

RuntimeError: Broken dataset assumption
Hi,

I am having trouble in running this command:

python launch.py method=C2FARM rlbench.task=take_lid_off_saucepan rlbench.demo_path=/home/RLBench/tools/mnt/my/save/dir framework.gpu=0

And it shows me this:

/home/itu/anaconda3/envs/env/lib/python3.6/site-packages/setuptools/distutils_patch.py:26: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first. "Distutils was imported before Setuptools. This usage is discouraged " launch.py:330: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_name='config', config_path='conf') /home/itu/anaconda3/envs/env/lib/python3.6/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing _self_. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) /home/itu/anaconda3/envs/env/lib/python3.6/site-packages/hydra/core/default_element.py:128: UserWarning: In 'method/C2FARM': Usage of deprecated keyword in package header '# @package group'. See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/changes_to_package_header for more information See {url} for more information""" /home/itu/anaconda3/envs/env/lib/python3.6/site-packages/hydra/_internal/hydra.py:127: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. configure_logging=with_log_configuration, [2022-11-18 19:09:44,691][root][INFO] - method: name: C2FARM lr: 0.0005 tau: 0.0025 activation: lrelu norm: None lambda_weight_l2: 1.0e-06 lambda_trans_qreg: 1.0 lambda_rot_qreg: 1.0 rotation_resolution: 5 image_crop_size: 64 bounds_offset:

0.15 voxel_sizes:

16

16 crop_augmentation: true demo_augmentation: true demo_augmentation_every_n: 10 exploration_strategy: gaussian rlbench: task: take_lid_off_saucepan demos: 100 demo_path: /home/itu/RL_FYP/RLBench/tools/mnt/my/save/dir episode_length: 100 cameras:

front camera_resolution:

128

128 scene_bounds:

-0.3

-0.5

0.6

0.7

0.5

1.6 replay: batch_size: 128 timesteps: 1 prioritisation: true use_disk: false path: /tmp/arm/replay framework: log_freq: 100 save_freq: 100 train_envs: 1 eval_envs: 1 replay_ratio: 128 transitions_before_train: 200 tensorboard_logging: true csv_logging: true training_iterations: 40000 gpu: 0 env_gpu: 0 logdir: /tmp/arm_test/ seeds: 1

[2022-11-18 19:09:44,722][root][INFO] - Using training device cuda:0. [2022-11-18 19:09:44,722][root][INFO] - Using env device cuda:0. [2022-11-18 19:09:44,776][root][INFO] - CWD:/tmp/arm_test/take_lid_off_saucepan/C2FARM [2022-11-18 19:09:44,777][root][INFO] - Starting seed 0. [2022-11-18 19:09:44,777][root][INFO] - Creating a PrioritizedReplayBuffer replay memory with the following parameters: [2022-11-18 19:09:44,778][root][INFO] - timesteps: 1 [2022-11-18 19:09:44,778][root][INFO] - replay_capacity: 100000 [2022-11-18 19:09:44,778][root][INFO] - batch_size: 128 [2022-11-18 19:09:44,778][root][INFO] - update_horizon: 1 [2022-11-18 19:09:44,778][root][INFO] - gamma: 0.990000 [2022-11-18 19:09:44,778][root][INFO] - saving to RAM [2022-11-18 19:09:44,783][root][INFO] - Filling replay with demos... Error executing job with overrides: ['method=C2FARM', 'rlbench.task=take_lid_off_saucepan', 'rlbench.demo_path=/home/itu/RL_FYP/RLBench/tools/mnt/my/save/dir', 'framework.gpu=0'] Traceback (most recent call last): File "launch.py", line 369, in main run_seed(cfg, env, cfg.rlbench.cameras, train_device, env_device, seed) File "launch.py", line 113, in run_seed cfg.method.rotation_resolution, cfg.method.crop_augmentation) File "/home/itu/RL_FYP/ARM/arm/c2farm/launch_utils.py", line 178, in fill_replay from_episode_number=d_idx)[0] File "/home/itu/anaconda3/envs/env/lib/python3.6/site-packages/rlbench/environment.py", line 161, in get_demos task_name, self._obs_config, random_selection, from_episode_number) File "/home/itu/anaconda3/envs/env/lib/python3.6/site-packages/rlbench/utils.py", line 100, in get_stored_demos raise RuntimeError('Broken dataset assumption') RuntimeError: Broken dataset assumption

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

We tried changing the path and other properties like episodes and length of episodes from config.yaml file, because we had 100 episodes in RLBench folder, but the same problem persists.

Please help me out in getting this resolved.
opened by RAFAY-AAMIR-GULL 3
About the version of RLBench
Hi. I think ARM is an excellent work for the intelligent robot. When I ran the code with the latest RLBench 1.1.0 release, I met some errors about RLBench.

Traceback (most recent call last): File "launch.py", line 17, in <module> from rlbench.action_modes.action_mode import MoveArmThenGripper ModuleNotFoundError: No module named 'rlbench.action_modes.action_mode'; 'rlbench.action_modes' is not a package

It seems that the latest RLBench 1.1.0 release doesn't match the code. Can you help us with this situation? Thanks in advance.
opened by weixiang-smart 2
Request for the source code of the real robot experiments

Dear Stephen,

thank you for sharing your interesting work! I would like to use your concept to simulate my own use cases with a KUKA iiwa in my own environment and then test it in a real environment. So I would like to know how I can change the model and the environment in the simulation. Additionally, I would like to ask for the code for the real experiments so that I can adapt it for my use cases and test it with my real system.

Thanks a lot for your help and best regards Marcel

opened by Marcelbgit 1
question about calculate the attended coordinates in the demo trajectories

Hi,

Thanks for sharing this interesting project. However, I got confused about the voxel center calculation. However, it seems that the voxel center calculation is not consistent between demo data and policy inference (see the code below).

https://github.com/stepjam/ARM/blob/c4285b6404702a97b3c5ac463af34c626f926b57/arm/c2farm/launch_utils.py#L96 https://github.com/stepjam/ARM/blob/c4285b6404702a97b3c5ac463af34c626f926b57/arm/c2farm/qattention_agent.py#L424

Just wondering whether it is a small bug or I misunderstand something.

Thanks a lot.

opened by WeiChengTseng 1

Code crashes when trying to run.

Hi Stephen,

Interesting work!! I was trying to run the launch script and facing some issues with it. Basically, although it runs, it consistently crashes for me during training. I have attached the crash and the VREP crash log below.

It seems that calling env.step from the code results in PyRep arms/arm.py which calls sim.simGetConfigForTipPose with an invalid ik-group argument. I manually verified that this value is 2030003 but I am not sure why it is invalid or why this error crops up. I have separately checked my PyRep installation and I can run the example scripts, so I am not really sure what's wrong.

I would appreciate if you could help me with it.

Process train_env0:                                                                                                                                                                                            
Traceback (most recent call last):                                                                                                                                                                             
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap                                                                                                      
    self.run()                                                                                                                                                                                                 
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/multiprocessing/process.py", line 99, in run                                                                                                              
    self._target(*self._args, **self._kwargs)                                                                                                                                                                  
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/yarr/runners/_env_runner.py", line 167, in _run_env                                                                                         
    raise e                                                                                                                                                                                                    
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/yarr/runners/_env_runner.py", line 141, in _run_env                                                                                         
    for replay_transition in generator:                                                                                                                                                                        
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/yarr/utils/rollout_generator.py", line 34, in generator                                                                                     
    transition = env.step(act_result)                                                                                                                                                                          
  File "/home/mohit/ARM/arm/custom_rlbench_env.py", line 114, in step                                                                                                                  
    obs, reward, terminal = self._task.step(action)                                                                                                                                                            
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/rlbench/task_environment.py", line 302, in step              
    list(arm_action), collision_checking=False)                                                                                                                                                                
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/rlbench/task_environment.py", line 199, in _path_action    
    action, collision_checking, relative_to)                                                                                                                                                                   
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/rlbench/task_environment.py", line 146, in _path_action_get_path
    relative_to=relative_to,                                                                                                                                                                                   
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/pyrep/robots/arms/arm.py", line 447, in get_path                                                                                            
    relative_to)                                                                                                                                                                                               
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/pyrep/robots/arms/arm.py", line 390, in get_nonlinear_path                                                                                  
    strings=[algorithm.value])                                                                                                                                                                                 
  File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/pyrep/backend/utils.py", line 67, in script_call                                                                                            
    list(floats), list(strings), bytes)                                                                                                                                                                          File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/pyrep/backend/sim.py", line 702, in simExtCallScriptFunction                                                                                
    _check_return(ret)                                                                                                                                                                                           File "/home/mohit/anaconda3/envs/arm/lib/python3.7/site-packages/pyrep/backend/sim.py", line 32, in _check_return                                                                                            
    'The call failed on the V-REP side. Return value: %d' % ret)                                                                                                                                               
RuntimeError: The call failed on the V-REP side. Return value: -1

here is the VREP preview which shows the crash log q_attention_code_fail 2021-08-07 15-55-06

opened by mohitsharma0690 1

minimal changes to enable agent.act on GPU

Small changes to agent.act to move cpu tensors to gpu, so that when the agent is loaded to EnvRunner and gets mapped to a GPU, the stepping time (either train Or eval) is greatly reduced. Notice how calling an extra .cpu() before tensor.numpy() doesn't do any damage.

opened by MandiZhao 1
How to debug C2F-ARM?

Hi, thanks for this excellent work. C2F-ARM is trained using the YARR framework, but I want to know how should I do to debug the training process line by line. Could you have a idea to solve it?

opened by kevin-xuan 0
About controlling Franka robot by HTC VIVE

Dear Stephen,

Thanks for sharing your excellent work.

While watching the video of your real robot experiments, we were fascinated by how well you controlled the Franka robot. Could you please provide your solution for using the HTC VIVE controller to control the Franka robot? It will be nice of you to release your program of controlling the Franka by HTC VIVE.

Looking forward to your reply!

Best regards, Xiang

opened by weixiang-smart 0

how to launch ARM instead of C2FARM

Hello. I tried to run launch with ARM as a method instead of C2FARM.

python launch.py method=ARM rlbench.task=take_lid_off_saucepan rlbench.demo_path=/home/softgear/stepjam_ARM/my_save_dir framework.gpu=0 I met warnings and errors as following. How can I launch ARM ?


launch.py:332: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_name='config', config_path='conf')
/home/softgear/.local/lib/python3.8/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing `_self_`. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
/home/softgear/.local/lib/python3.8/site-packages/hydra/core/default_element.py:124: UserWarning: In 'method/ARM': Usage of deprecated keyword in package header '# @package _group_'.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/changes_to_package_header for more information
  deprecation_warning(
/home/softgear/.local/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[2022-11-01 11:38:44,226][root][INFO] - 
method:
  name: ARM
  activation: lrelu
  q_conf: true
  alpha: 0.05
  alpha_lr: 0.0001
  alpha_auto_tune: false
  next_best_pose_critic_lr: 0.0025
  next_best_pose_actor_lr: 0.001
  next_best_pose_critic_weight_decay: 1.0e-05
  next_best_pose_actor_weight_decay: 1.0e-05
  crop_shape:
  - 16
  - 16
  next_best_pose_tau: 0.005
  next_best_pose_critic_grad_clip: 5
  next_best_pose_actor_grad_clip: 5
  qattention_grad_clip: 5
  qattention_tau: 0.005
  qattention_lr: 0.0005
  qattention_weight_decay: 1.0e-05
  qattention_lambda_qreg: 1.0e-07
  demo_augmentation: true
  demo_augmentation_every_n: 10
rlbench:
  task: take_lid_off_saucepan
  demos: 10
  demo_path: /home/softgear/stepjam_ARM/my_save_dir
  episode_length: 10
  cameras:
  - front
  camera_resolution:
  - 128
  - 128
  scene_bounds:
  - -0.3
  - -0.5
  - 0.6
  - 0.7
  - 0.5
  - 1.6
replay:
  batch_size: 128
  timesteps: 1
  prioritisation: true
  use_disk: false
  path: /tmp/arm/replay
framework:
  log_freq: 100
  save_freq: 100
  train_envs: 1
  eval_envs: 1
  replay_ratio: 128
  transitions_before_train: 200
  tensorboard_logging: true
  csv_logging: true
  training_iterations: 40000
  gpu: 0
  env_gpu: 0
  logdir: /tmp/arm_test/
  seeds: 1

[2022-11-01 11:38:44,254][root][INFO] - Using training device cuda:0.
[2022-11-01 11:38:44,254][root][INFO] - Using env device cuda:0.
[2022-11-01 11:38:44,264][root][INFO] - CWD:/tmp/arm_test/take_lid_off_saucepan/ARM
[2022-11-01 11:38:44,264][root][INFO] - Starting seed 0.
[2022-11-01 11:38:44,265][root][INFO] - Creating a PrioritizedReplayBuffer replay memory with the following parameters:
[2022-11-01 11:38:44,265][root][INFO] - 	 timesteps: 1
[2022-11-01 11:38:44,265][root][INFO] - 	 replay_capacity: 100000
[2022-11-01 11:38:44,265][root][INFO] - 	 batch_size: 128
[2022-11-01 11:38:44,265][root][INFO] - 	 update_horizon: 1
[2022-11-01 11:38:44,265][root][INFO] - 	 gamma: 0.990000
[2022-11-01 11:38:44,265][root][INFO] - 	 saving to RAM
[2022-11-01 11:38:44,269][root][INFO] - Filling replay with demos...
[2022-11-01 11:38:45,682][root][INFO] - Replay filled with demos.
[2022-11-01 11:38:46,631][root][INFO] - # Q-attention Params: 86386
/home/softgear/stepjam_ARM/ARM/arm/arm/next_best_pose_agent.py:148: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)
  action_min_max = torch.tensor(self._action_min_max).to(device)
[2022-11-01 11:38:46,656][root][INFO] - # NBP Critic Params: 1085572
[2022-11-01 11:38:46,656][root][INFO] - # NBP Actor Params: 51152
/home/softgear/stepjam_ARM/ARM/launch.py:332: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_name='config', config_path='conf')
/home/softgear/stepjam_ARM/ARM/launch.py:332: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_name='config', config_path='conf')
/home/softgear/stepjam_ARM/ARM/launch.py:332: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_name='config', config_path='conf')
/home/softgear/stepjam_ARM/ARM/arm/arm/qattention_agent.py:35: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  indices = torch.cat((m // t_shape[-1], m % t_shape[-1]), dim=1)
/home/softgear/stepjam_ARM/ARM/arm/arm/next_best_pose_agent.py:148: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)
  action_min_max = torch.tensor(self._action_min_max).to(device)
/home/softgear/stepjam_ARM/ARM/arm/arm/next_best_pose_agent.py:148: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)
  action_min_max = torch.tensor(self._action_min_max).to(device)
/home/softgear/stepjam_ARM/ARM/arm/arm/qattention_agent.py:35: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  indices = torch.cat((m // t_shape[-1], m % t_shape[-1]), dim=1)
/home/softgear/stepjam_ARM/ARM/arm/arm/qattention_agent.py:35: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  indices = torch.cat((m // t_shape[-1], m % t_shape[-1]), dim=1)
[CoppeliaSim:loadinfo]   done.
Process train_env0:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/softgear/.local/lib/python3.8/site-packages/yarr/runners/_env_runner.py", line 169, in _run_env
    raise e
  File "/home/softgear/.local/lib/python3.8/site-packages/yarr/runners/_env_runner.py", line 143, in _run_env
    for replay_transition in generator:
  File "/home/softgear/.local/lib/python3.8/site-packages/yarr/utils/rollout_generator.py", line 30, in generator
    agent_obs_elems = {k: np.array(v) for k, v in
  File "/home/softgear/.local/lib/python3.8/site-packages/yarr/utils/rollout_generator.py", line 30, in <dictcomp>
    agent_obs_elems = {k: np.array(v) for k, v in
  File "/home/softgear/.local/lib/python3.8/site-packages/torch/_tensor.py", line 757, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
[2022-11-01 11:38:55,800][root][WARNING] - Env train_env0 failed (1 times <= 10). restarting
[CoppeliaSim:loadinfo]   done.
^C (SIGINT)

opened by softgearko 0

ARM (and YARR) conflicts with current RLBench (1.2.0)

Hi, can you help me and tell me which rlbench and yarr versions/tags are compatible with each other? For most of the problems I believe that pytorch is the issue and I don't find in any requirements.txt which one you use to make things work.

I observe this error

Process train_env0:
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/runners/_env_runner.py", line 169, in _run_env
    raise e
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/runners/_env_runner.py", line 143, in _run_env
    for replay_transition in generator:
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/utils/rollout_generator.py", line 35, in generator
    transition = env.step(act_result)
  File "/home/user/ARM/arm/custom_rlbench_env.py", line 128, in step
    obs, reward, terminal = self._task.step(action)
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/rlbench/task_environment.py", line 99, in step
    self._action_mode.action(self._scene, action)
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/rlbench/action_modes/action_mode.py", line 32, in action
    arm_action = np.array(action[:arm_act_size])
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
[2022-05-27 10:10:31,983][root][ERROR] - Env train_env0 failed too many times (11 times > 10)
Exception in thread EnvRunnerThread:
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/runners/env_runner.py", line 134, in _run
    raise RuntimeError('Too many process failures.')
RuntimeError: Too many process failures.

I pulled the current version of RLbench and YARR and did a re-install of all packages in a new conda environment.

I am wondering if you use a different torch version that can handle tensor to numpy automatically better. Currently I fixed this by adding .cpu() in a few files

YARR

 git diff main

diff --git a/yarr/envs/rlbench_env.py b/yarr/envs/rlbench_env.py
index 6aad118..6460fb1 100644
--- a/yarr/envs/rlbench_env.py
+++ b/yarr/envs/rlbench_env.py
@@ -6,7 +6,7 @@ try:
 except (ModuleNotFoundError, ImportError) as e:
     print("You need to install RLBench: 'https://github.com/stepjam/RLBench'")
     raise e
-from rlbench.action_modes import ActionMode
+from rlbench.action_modes.action_mode import ActionMode
 from rlbench.backend.observation import Observation
 from rlbench.backend.task import Task
 
diff --git a/yarr/utils/rollout_generator.py b/yarr/utils/rollout_generator.py
index d4d2973..a3f12ee 100644
--- a/yarr/utils/rollout_generator.py
+++ b/yarr/utils/rollout_generator.py
@@ -27,7 +27,7 @@ class RolloutGenerator(object):
                                    deterministic=eval)
 
             # Convert to np if not already
-            agent_obs_elems = {k: np.array(v) for k, v in
+            agent_obs_elems = {k: np.array(v.cpu()) for k, v in
                                act_result.observation_elements.items()}
             extra_replay_elements = {k: np.array(v) for k, v in
                                      act_result.replay_elements.items()}
@@ -66,7 +66,7 @@ class RolloutGenerator(object):
                     prepped_data = {k: torch.tensor([v], device=self._env_device) for k, v in obs_history.items()}
                     act_result = agent.act(step_signal.value, prepped_data,
                                            deterministic=eval)
-                    agent_obs_elems_tp1 = {k: np.array(v) for k, v in
+                    agent_obs_elems_tp1 = {k: np.array(v.cpu()) for k, v in
                                            act_result.observation_elements.items()}
                     obs_tp1.update(agent_obs_elems_tp1)
                 replay_transition.final_observation = obs_tp1

(Side note: Also observe that with the recent changes in folder structure in RLbench I changed the import for ActionMode.)

RLBench

git diff master

diff --git a/rlbench/action_modes/action_mode.py b/rlbench/action_modes/action_mode.py
index 68171a37..a2c264ef 100644
--- a/rlbench/action_modes/action_mode.py
+++ b/rlbench/action_modes/action_mode.py
@@ -29,8 +29,8 @@ class MoveArmThenGripper(ActionMode):
 
     def action(self, scene: Scene, action: np.ndarray):
         arm_act_size = np.prod(self.arm_action_mode.action_shape(scene))
-        arm_action = np.array(action[:arm_act_size])
-        ee_action = np.array(action[arm_act_size:])
+        arm_action = np.array(action[:arm_act_size].cpu())
+        ee_action = np.array(action[arm_act_size:].cpu())
         self.arm_action_mode.action(scene, arm_action)
         self.gripper_action_mode.action(scene, ee_action)

I believe that the error comes from a change somewhere else though, or that you use a torch version that can deal with this? Can you please help me? I don't know which pyorch version you are using. It is missing in the requirements.txt. I installed pytorch with conda. conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

An error that I am unable to fix is this one

Exception in thread EnvRunnerThread:
Traceback (most recent call last):
  File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/runners/env_runner.py", line 141, in _run
    new_transitions = self._update()
  File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/site-packages/yarr/runners/env_runner.py", line 86, in _update
    self._agent_summaries = list(
  File "<string>", line 2, in __getitem__
  File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/multiprocessing/managers.py", line 825, in _callmethod
    raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError: 
---------------------------------------------------------------------------
Unserializable message: Traceback (most recent call last):
  File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/multiprocessing/managers.py", line 300, in serve_client
    send(msg)
  File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 249, in reduce_tensor
    event_sync_required) = storage._share_cuda_()
  File "/home/alexander/anaconda3/envs/ARM/lib/python3.9/site-packages/torch/storage.py", line 623, in _share_cuda_
    return self._storage._share_cuda_(*args, **kwargs)
RuntimeError: Attempted to send CUDA tensor received from another process; this is not currently supported. Consider cloning before sending.

---------------------------------------------------------------------------
[W CudaIPCTypes.cpp:92] Producer process tried to deallocate over 1000 memory blocks referred by consumer processes. Deallocation might be significantly slowed down. We assume it will never going to be the case, but if it is, please file but to https://github.com/pytorch/pytorch

Do you have advice? It seems to me like pytorch is the issue for most of the problems I mentioned.

using: Python 3.9.12

opened by alexanderdurr 8

Attention-driven Robot Manipulation (ARM) which includes Q-attention

Related tags

Overview

Attention-driven Robotic Manipulation (ARM)

Installation

Running experiments

Comments

RuntimeError: Broken dataset assumption

About the version of RLBench

Request for the source code of the real robot experiments

question about calculate the attended coordinates in the demo trajectories

Code crashes when trying to run.

minimal changes to enable agent.act on GPU

How to debug C2F-ARM?

About controlling Franka robot by HTC VIVE

how to launch ARM instead of C2FARM

ARM (and YARR) conflicts with current RLBench (1.2.0)

Owner

Stephen James

ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

Make a Turtlebot3 follow a figure 8 trajectory and create a robot arm and make it follow a trajectory

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

Control-Raspberry-Pi-Robot-using-Hand-Gestures - A 4WD Robot car based on Raspberry Pi that controlled by hand gestures(using openCV and mediapipe)

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

PyTorch implementation of ARM-Net: Adaptive Relation Modeling Network for Structured Data.

Doosan robotic arm, simulation, control, visualization in Gazebo and ROS2 for Reinforcement Learning.

Keyword spotting on Arm Cortex-M Microcontrollers

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

A robotic arm that mimics hand movement through MediaPipe tracking.

Pytorch implementation of "ARM: Any-Time Super-Resolution Method"

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.

A hobby project which includes a hand-gesture based virtual piano using a mobile phone camera and OpenCV library functions

A module that used for encrypt code which includes RSA and AES

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.