Official implementation of the Implicit Behavioral Cloning (IBC) algorithm

Google Research

Last update: Dec 9, 2022

Related tags

Deep Learning ibc

Overview

Implicit Behavioral Cloning

This codebase contains the official implementation of the Implicit Behavioral Cloning (IBC) algorithm from our paper:

Implicit Behavioral Cloning (website link) (arXiv link)
Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson
Conference on Robot Learning (CoRL) 2021

Abstract

We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.

Prerequisites

The code for this project uses python 3.7+ and the following pip packages:

python3 -m pip install --upgrade pip
pip install \
  absl-py==0.12.0 \
  gin-config==0.4.0 \
  matplotlib==3.4.3 \
  mediapy==1.0.3 \
  opencv-python==4.5.3.56 \
  pybullet==3.1.6 \
  scipy==1.7.1 \
  tensorflow==2.6.0 \
  tensorflow-probability==0.13.0 \
  tf-agents-nightly==0.10.0.dev20210930 \
  tqdm==4.62.2

(Optional): For Mujoco support, see docs/mujoco_setup.md. Recommended to skip it unless you specifically want to run the Adroit and Kitchen environments.

Quickstart: from 0 to a trained IBC policy in 10 minutes.

Step 1: Install listed Python packages above in Prerequisites.

Step 2: Run unit tests (should take less than a minute), and do this from the directory just above the top-level ibc directory:

./ibc/run_tests.sh

Step 3: Check that Tensorflow has GPU access:

python3 -c "import tensorflow as tf; print(tf.test.is_gpu_available())"

If the above prints False, see the following requirements, notably CUDA 11.2 and cuDNN 8.1.0: https://www.tensorflow.org/install/gpu#software_requirements.

Step 4: Let's do an example Block Pushing task, so first let's download oracle data (or see Tasks for how to generate it):

cd ibc/data
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_states_location.zip
unzip block_push_states_location.zip && rm block_push_states_location.zip
cd ../..

Step 5: Set PYTHONPATH to include the directory just above top-level ibc, so if you've been following the commands above it is:

export PYTHONPATH=$PYTHONPATH:${PWD}

Step 6: On that example Block Pushing task, we'll next do a training + evaluation with Implicit BC:

./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh

Some notes:

On an example single-GPU machine (GTX 2080 Ti), the above trains at about 18 steps/sec, and should get to high success rates in 5,000 or 10,000 steps (roughly 5-10 minutes of training).
The mlp_ebm.gin is just one config, with is meant to be reasonably fast to train, with only 20 evals at each interval, and is not suitable for all tasks. See Tasks for more configs.
Due to the --video flag above, you can watch a video of the learned policy in action at: /tmp/ibc_logs/mlp_ebm/ibc_dfo/... navigate to the videos/ttl=7d subfolder, and by default there should be one example .mp4 video saved every time you do an evaluation interval.

(Optional) Step 7: For the pybullet-based tasks, we also have real-time interactive visualization set up through a visualization server, so in one terminal:

cd <path_to>/ibc/..
export PYTHONPATH=$PYTHONPATH:${PWD}
python3 -m pybullet_utils.runServer

And in a different terminal run the oracle a few times with the --shared_memory flag:

cd <path_to>/ibc/..
export PYTHONPATH=$PYTHONPATH:${PWD}
python3 ibc/data/policy_eval.py -- \
  --alsologtostderr \
  --shared_memory \
  --num_episodes=3 \
  --policy=oracle_push \
  --task=PUSH

You're done with Quickstart! See below for more Tasks, and also see docs/codebase_overview.md and docs/workflow.md for additional info.

Tasks

Task: Particle

In this task, the goal is for the agent (black dot) to first go to the green dot, then the blue dot.

Example IBC policy	Example MSE policy

Get Data

We can either generate data from scratch, for example for 2D (takes 15 seconds):

./ibc/ibc/configs/particle/collect_data.sh

Or just download all the data for all different dimensions:

cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/particle.zip
unzip particle.zip && rm particle.zip
cd ../..

Train and Evaluate

Let's start with some small networks, on just the 2D version since it's easiest to visualize, and compare MSE and IBC. Here's a small-network (256x2) IBC-with-Langevin config, where 2 is the argument for the environment dimensionality.

./ibc/ibc/configs/particle/run_mlp_ebm_langevin.sh 2

And here's an idenitcally sized network (256x2) but with MSE config:

./ibc/ibc/configs/particle/run_mlp_mse.sh 2

For the above configurations, we suggest comparing the rollout videos, which you can find at /tmp/ibc_logs/...corresponding_directory../videos/. At the top of this section is shown a comparison at 10,000 training steps for the two different above configs.

And here are the best configs respectfully for IBC (with langevin) and MSE, in this case run on the 16-dimensional environment:

./ibc/ibc/configs/particle/run_mlp_ebm_langevin_best.sh 16
./ibc/ibc/configs/particle/run_mlp_mse_best.sh 16

Note: the _best config is kind of slow for Langevin to train, but even just ./ibc/ibc/configs/particle/run_mlp_ebm_langevin.sh 16 (smaller network) seems to solve the 16-D environment pretty well, and is much faster to train.

Task: Block Pushing (from state observations)

Get Data

We can either generate data from scratch (~2 minutes for 2,000 episodes: 200 each across 10 replicas):

./ibc/ibc/configs/pushing_states/collect_data.sh

Or we can download data from the web:

cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_states_location.zip
unzip 'block_push_states_location.zip' && rm block_push_states_location.zip
cd ../..

Train and Evaluate

Here's reasonably fast-to-train config for IBC with DFO:

./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh

Or here's a config for IBC with Langevin:

./ibc/ibc/configs/pushing_states/run_mlp_ebm_langevin.sh

Or here's a comparable, reasonably fast-to-train config for MSE:

./ibc/ibc/configs/pushing_states/run_mlp_mse.sh

Or to run the best configs respectfully for IBC, MSE, and MDN (some of these might be slower to train than the above):

./ibc/ibc/configs/pushing_states/run_mlp_ebm_best.sh
./ibc/ibc/configs/pushing_states/run_mlp_mse_best.sh
./ibc/ibc/configs/pushing_states/run_mlp_mdn_best.sh

Task: Block Pushing (from image observations)

Get Data

Download data from the web:

cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_visual_location.zip
unzip 'block_push_visual_location.zip' && rm block_push_visual_location.zip
cd ../..

Train and Evaluate

Here is an IBC with Langevin configuration which should actually converge faster than the IBC-with-DFO that we reported in the paper:

./ibc/ibc/configs/pushing_pixels/run_pixel_ebm_langevin.sh

And here are the best configs respectfully for IBC (with DFO), MSE, and MDN:

./ibc/ibc/configs/pushing_pixels/run_pixel_ebm_best.sh
./ibc/ibc/configs/pushing_pixels/run_pixel_mse_best.sh
./ibc/ibc/configs/pushing_pixels/run_pixel_mdn_best.sh

Task: D4RL Adroit and Kitchen

Get Data

The D4RL human demonstration training data used for the paper submission can be downloaded using the commands below. This data has been processed into a .tfrecord format from the original D4RL data format:

cd ibc/data && mkdir -p d4rl_trajectories && cd d4rl_trajectories
wget https://storage.googleapis.com/brain-reach-public/ibc_data/door-human-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/hammer-human-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-complete-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-mixed-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-partial-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/pen-human-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/relocate-human-v0.zip
unzip '*.zip' && rm *.zip
cd ../../..

Run Train Eval:

Here are the best configs respectfully for IBC (with Langevin), and MSE: On a 2080 Ti GPU test, this IBC config trains at only 1.7 steps/sec, but it is about 10x faster on TPUv3.

./ibc/ibc/configs/d4rl/run_mlp_ebm_langevin_best.sh pen-human-v0
./ibc/ibc/configs/d4rl/run_mlp_mse_best.sh pen-human-v0

The above commands will run on the pen-human-v0 environment, but you can swap this arg for whichever of the provided Adroit/Kitchen environments.

Here also is an MDN config you can try. The network size is tiny but if you increase it heavily then it seems to get NaNs during training. In general MDNs can be finicky. A solution should be possible though.

./ibc/ibc/configs/d4rl/run_mlp_mdn.sh pen-human-v0

Summary for Reproducing Results

For the tasks that we've been able to open-source, results from the paper should be reproducible by using the linked data and command-line args below.

Task	Figure/Table in paper	Data	Train + Eval commands
Coordinate regression	Figure 4	See colab	See colab
D4RL Adroit + Kitchen	Table 2	Link	Link
N-D particle	Figure 6	Link	Link
Simulated pushing, single target, states	Table 3	Link	Link
Simulated pushing, single target, pixels	Table 3	Link	Link

Citation

If you found our paper/code useful in your research, please consider citing:

@article{florence2021implicit,
    title={Implicit Behavioral Cloning},
    author={Florence, Pete and Lynch, Corey and Zeng, Andy and Ramirez, Oscar and Wahid, Ayzaan and Downs, Laura and Wong, Adrian and Lee, Johnny and Mordatch, Igor and Tompson, Jonathan},
    journal={Conference on Robot Learning (CoRL)},
    month = {November},
    year={2021}
}

Comments

Cannot register 2 metrics with the same name: /tensorflow/api/keras/optimizers

Just tried running tests on Ubuntu 20.04, CPython 3.8.10, but get the following error:

$ cd ibc/..
$ ./ibc/run_tests.sh
...
PYTHONPATH=:{parent}/ibc/.. python3 {parent}/ibc/ibc/agents/mcmc_test.py --alsologtostderr
...
2021-11-08 16:03:33.113843: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-08 16:03:33.113875: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-08 16:03:34.161325: E tensorflow/core/lib/monitoring/collection_registry.cc:77] Cannot register 2 metrics with the same name: /tensorflow/api/keras/optimizers
...
tensorflow.python.framework.errors_impl.AlreadyExistsError: Another metric with the same name already exists.
ERROR: 'PYTHONPATH=:{parent}/ibc/.. python3 {parent}/ibc/ibc/agents/mcmc_test.py --alsologtostderr' failed!

Full stack trace: https://gist.github.com/EricCousineau-TRI/ac6e9943606e6b9f7e335882f7caa350

~Not sure if it's b/c of CUDA error.~ ~I have stock Ubuntu CUDA 10.1 on my machine, so will try out NVidia-installed CUDA 11.0.~ See below.

Also happens when trying to run training script, ./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh

opened by EricCousineau-TRI 10

step after collecting data

in collect_oracle.py, time_step = env.step(action) is called before observation is recorded: episode_data.time_step.append(time_step). I apologize if I misunderstood the implementation. But as far as I can see the implementation, env.step(action) returns next TimeStep. Thereby, the action at the current time and the observation at the next time are stored in pairs with the same index.

Wouldn't it be correct to record the state of the system when the action is decided?

opened by syundo0730 3
Goal Tolerance Are Different for Different Methods
Hi, I found that

train_eval.goal_tolerance = 0.02

is set in EBM's config but not in MSE's config.

The difference makes the evaluation to be more strict on MSE-based BC as the default goal_tolerance=0.01 (code).

Setting train_eval.goal_tolerance = 0.01 for the EBM agent decreases its success rate from 1.0 to [0.85, 0.95] after training for 10k steps.
opened by yenchenlin 3
README: Update pypi package versions for keras and tf-agents

Avoids collision btw keras and tensorflow version Avoids collision btw stable and nightly for tensorflow-probability

Resolves #1 (I think)

After using this, I see the following pip freeze output: https://gist.github.com/EricCousineau-TRI/ac6e9943606e6b9f7e335882f7caa350#file-new-pip-freeze-txt

@peteflorence This allowed me to run ./run_tests.sh with all of 'em passing!
cla: yes

opened by EricCousineau-TRI 0
Whether image input is provided in this codebase？

Hi @peteflorence ，

I'm trying to reproduce the IBC project. Thanks for open sourcing this work！I was wondering if this codebase provides an interface for image input that can be used in the real world.My sincerest thanks in advance！

Sincerely， Vinson

opened by Vinson-Tang 2
Support for Categorical Action Space

I am currently working on implementing implicit BC for a task which has both keyboard and mouse inputs as action-space, is there a straightforward way to make this action space suitable for the implicit regression task?

opened by rokosbasilisk 0
Maybe the version of gym needs to be written to the readme

Hi @peteflorence In the new version of GYM, 'done' has been removed from the parameters of step, and 'terminated' and 'truncated' have been added. So running the unit test in the new version of the GYM environment will fail. I think maybe the version of GYM used for this project should be indicated in the readme. Thanks， Vinson

opened by Vinson-Tang 1
Unit tests fail
Hi, it seems that the unit tests do not work out of the box. I'm working in a clean conda environment with Python 3.7.13. All of the prerequisites are installed with the versions described in the readme, as well as CUDA and cuDNN (Tensorflow has GPU access).

Here's the complete output from the test script:

Test script output
ibc-test ❯ ./ibc/run_tests.sh bash: /home/arc/miniconda3/envs/ibc-test/lib/libtinfo.so.6: no version information available (required by bash) Running run_tests.sh in directory /home/arc/noah Running tests: /home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py /home/arc/noah/ibc/environments/block_pushing/block_pushing_test.py /home/arc/noah/ibc/environments/utils/utils_pybullet_test.py /home/arc/noah/ibc/environments/utils/xarm_sim_robot_test.py /home/arc/noah/ibc/environments/particle/particle_test.py /home/arc/noah/ibc/ibc/agents/mcmc_test.py /home/arc/noah/ibc/ibc/train/stats_test.py /home/arc/noah/ibc/data/dataset_test.py *********************************************************************** Running test /home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py *********************************************************************** PYTHONPATH=:/home/arc/noah:/home/arc/noah/ibc/.. python3 /home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py --alsologtostderr /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:23: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead. 'nearest': pil_image.NEAREST, /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:24: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead. 'bilinear': pil_image.BILINEAR, /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:25: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead. 'bicubic': pil_image.BICUBIC, /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:28: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead. if hasattr(pil_image, 'HAMMING'): /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:30: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead. if hasattr(pil_image, 'BOX'): /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:33: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead. if hasattr(pil_image, 'LANCZOS'): /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/__init__.py:56: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if (distutils.version.LooseVersion(tf_version) < /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tensorflow_probability/python/__init__.py:61: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if (distutils.version.LooseVersion(tf.__version__) < /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/utils/common.py:87: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. and (distutils.version.LooseVersion(tf.__version__) <= pybullet build time: Jun 28 2022 14:19:23 /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/envs/registration.py:416: UserWarning: WARN: The `registry.env_specs` property along with `EnvSpecTree` is deprecated. Please use `registry` directly as a dictionary instead. "The `registry.env_specs` property along with `EnvSpecTree` is deprecated. Please use `registry` directly as a dictionary instead." 2022-06-29 13:44:14.864729: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-29 13:44:14.868489: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-29 13:44:14.868817: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero Running tests under Python 3.7.13: /home/arc/miniconda3/envs/ibc-test/bin/python3 [ RUN ] Blocks2DTest.test_load_push_env /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/spaces/box.py:112: UserWarning: WARN: Box bound precision lowered by casting to float32 logger.warn(f"Box bound precision lowered by casting to {self.dtype}") argv[0]= I0629 13:44:14.880214 140609892135296 utils_pybullet.py:85] Loading URDF plane.urdf I0629 13:44:14.885888 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/workspace.urdf I0629 13:44:14.886190 140609892135296 utils_pybullet.py:85] Loading URDF xarm/xarm6_robot.urdf I0629 13:44:14.908028 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/suction/suction-head-long.urdf I0629 13:44:14.911714 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone.urdf I0629 13:44:14.912028 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone2.urdf I0629 13:44:14.912325 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block.urdf I0629 13:44:14.912554 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block2.urdf INFO:tensorflow:time(__main__.Blocks2DTest.test_load_push_env): 0.09s I0629 13:44:14.959522 140609892135296 test_util.py:2189] time(__main__.Blocks2DTest.test_load_push_env): 0.09s [ OK ] Blocks2DTest.test_load_push_env [ RUN ] Blocks2DTest.test_serialize_state_push argv[0]= I0629 13:44:14.963971 140609892135296 utils_pybullet.py:85] Loading URDF plane.urdf I0629 13:44:14.969090 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/workspace.urdf I0629 13:44:14.969393 140609892135296 utils_pybullet.py:85] Loading URDF xarm/xarm6_robot.urdf I0629 13:44:14.987162 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/suction/suction-head-long.urdf I0629 13:44:14.991161 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone.urdf I0629 13:44:14.991568 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone2.urdf I0629 13:44:14.991911 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block.urdf I0629 13:44:14.992150 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block2.urdf INFO:tensorflow:time(__main__.Blocks2DTest.test_serialize_state_push): 0.13s I0629 13:44:15.085130 140609892135296 test_util.py:2189] time(__main__.Blocks2DTest.test_serialize_state_push): 0.13s [ OK ] Blocks2DTest.test_serialize_state_push [ RUN ] Blocks2DTest.test_session [ SKIPPED ] Blocks2DTest.test_session [ RUN ] Blocks2DTest.test_validate_environment argv[0]= I0629 13:44:15.090482 140609892135296 utils_pybullet.py:85] Loading URDF plane.urdf I0629 13:44:15.095655 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/workspace.urdf I0629 13:44:15.095959 140609892135296 utils_pybullet.py:85] Loading URDF xarm/xarm6_robot.urdf I0629 13:44:15.113764 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/suction/suction-head-long.urdf I0629 13:44:15.117837 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone.urdf I0629 13:44:15.118308 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone2.urdf I0629 13:44:15.118738 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block.urdf I0629 13:44:15.119005 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block2.urdf /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py:98: UserWarning: WARN: We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html "We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) " /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py:217: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator. "Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator. " /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py:229: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `return_info` to return information from the environment resetting. "Future gym versions will require that `Env.reset` can be passed `return_info` to return information from the environment resetting." /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py:234: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information. "Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information." /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/spaces/box.py:197: UserWarning: WARN: Casting input x to numpy array. logger.warn("Casting input x to numpy array.") INFO:tensorflow:time(__main__.Blocks2DTest.test_validate_environment): 0.09s I0629 13:44:15.179256 140609892135296 test_util.py:2189] time(__main__.Blocks2DTest.test_validate_environment): 0.09s [ FAILED ] Blocks2DTest.test_validate_environment ====================================================================== FAIL: test_validate_environment (__main__.Blocks2DTest) Blocks2DTest.test_validate_environment ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py", line 34, in test_validate_environment utils.validate_py_environment(env) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/utils.py", line 75, in validate_py_environment time_step = environment.reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 196, in reset self._current_time_step = self._reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/wrappers.py", line 111, in _reset return self._env.reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 196, in reset self._current_time_step = self._reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/gym_wrapper.py", line 193, in _reset observation = self._gym_env.reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/time_limit.py", line 66, in reset return self.env.reset(**kwargs) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 42, in reset return self.env.reset(**kwargs) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/env_checker.py", line 47, in reset return passive_env_reset_check(self.env, **kwargs) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 247, in passive_env_reset_check _check_obs(obs, env.observation_space, "reset") File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 115, in _check_obs ), f"{pre} is not contained with the observation space ({observation_space})" AssertionError: The observation returned by the `reset()` method is not contained with the observation space (Dict(block_translation: Box(-5.0, 5.0, (2,), float32), block_orientation: Box(-6.2831855, 6.2831855, (1,), float32), block2_translation: Box(-5.0, 5.0, (2,), float32), block2_orientation: Box(-6.2831855, 6.2831855, (1,), float32), effector_translation: Box([ 0.05 -0.6 ], [0.8 0.6], (2,), float32), effector_target_translation: Box([ 0.05 -0.6 ], [0.8 0.6], (2,), float32), target_translation: Box(-5.0, 5.0, (2,), float32), target_orientation: Box(-6.2831855, 6.2831855, (1,), float32), target2_translation: Box(-5.0, 5.0, (2,), float32), target2_orientation: Box(-6.2831855, 6.2831855, (1,), float32))) ---------------------------------------------------------------------- Ran 4 tests in 0.310s FAILED (failures=1, skipped=1) ERROR: 'PYTHONPATH=:/home/arc/noah:/home/arc/noah/ibc/.. python3 /home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py --alsologtostderr' failed!

Here's the last part of that formatted a bit more nicely:

INFO:tensorflow:time(__main__.Blocks2DTest.test_validate_environment): 0.09s I0629 13:44:15.179256 140609892135296 test_util.py:2189] time(__main__.Blocks2DTest.test_validate_environment): 0.09s [ FAILED ] Blocks2DTest.test_validate_environment ====================================================================== FAIL: test_validate_environment (__main__.Blocks2DTest) Blocks2DTest.test_validate_environment ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py", line 34, in test_validate_environment utils.validate_py_environment(env) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/utils.py", line 75, in validate_py_environment time_step = environment.reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 196, in reset self._current_time_step = self._reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/wrappers.py", line 111, in _reset return self._env.reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 196, in reset self._current_time_step = self._reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/gym_wrapper.py", line 193, in _reset observation = self._gym_env.reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/time_limit.py", line 66, in reset return self.env.reset(**kwargs) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 42, in reset return self.env.reset(**kwargs) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/env_checker.py", line 47, in reset return passive_env_reset_check(self.env, **kwargs) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 247, in passive_env_reset_check _check_obs(obs, env.observation_space, "reset") File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 115, in _check_obs ), f"{pre} is not contained with the observation space ({observation_space})" AssertionError: The observation returned by the `reset()` method is not contained with the observation space (Dict(block_translation: Box(-5.0, 5.0, (2,), float32), block_orientation: Box(-6.2831855, 6.2831855, (1,), float32), block2_translation: Box(-5.0, 5.0, (2,), float32), block2_orientation: Box(-6.2831855, 6.2831855, (1,), float32), effector_translation: Box([ 0.05 -0.6 ], [0.8 0.6], (2,), float32), effector_target_translation: Box([ 0.05 -0.6 ], [0.8 0.6], (2,), float32), target_translation: Box(-5.0, 5.0, (2,), float32), target_orientation: Box(-6.2831855, 6.2831855, (1,), float32), target2_translation: Box(-5.0, 5.0, (2,), float32), target2_orientation: Box(-6.2831855, 6.2831855, (1,), float32))) ----------------------------------------------------------------------

The issue seems to be that several fields of the observation returned by BlockPushMultimodal._compute_state() need to be converted to np arrays with dtype np.float32. After doing that and running the test again, I get the following error instead:

INFO:tensorflow:time(__main__.Blocks2DTest.test_validate_environment): 0.1s I0629 14:04:36.733099 140095945187712 test_util.py:2189] time(__main__.Blocks2DTest.test_validate_environment): 0.1s [ FAILED ] Blocks2DTest.test_validate_environment ====================================================================== ERROR: test_validate_environment (__main__.Blocks2DTest) Blocks2DTest.test_validate_environment ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py", line 34, in test_validate_environment utils.validate_py_environment(env) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/utils.py", line 84, in validate_py_environment time_step = environment.step(action) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 233, in step self._current_time_step = self._step(action) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/wrappers.py", line 117, in _step time_step = self._env.step(action) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 233, in step self._current_time_step = self._step(action) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/gym_wrapper.py", line 215, in _step observation, reward, self._done, self._info = self._gym_env.step(action) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/time_limit.py", line 49, in step observation, reward, done, info = self.env.step(action) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 37, in step return self.env.step(action) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/env_checker.py", line 39, in step return passive_env_step_check(self.env, action) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 273, in passive_env_step_check if np.any(np.isnan(obs)): TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' ----------------------------------------------------------------------

This is the same error as in #14 so I'd guess these issues are related.

Any thoughts?
opened by noahcgreen 2

Error running particle experiments

Hi, thanks for open sourcing this work! I tried running:

./ibc/ibc/configs/particle/run_mlp_ebm_langevin_best.sh 2

And got this error

  File "ibc/ibc/train_eval.py", line 397, in main                                                                        [122/528]
    strategy=strategy)                                                                                                              File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gin/config.py", line 1069, in gin_wrapper                          utils.augment_exception_message_and_reraise(e, err_str)                                                                         File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gin/utils.py", line 41, in augment_exception_message_and_rerais
e                                                                                                                                 
    raise proxy.with_traceback(exception.__traceback__) from None                                                                 
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gin/config.py", line 1046, in gin_wrapper                          return fn(*new_args, **new_kwargs)                                                                                            
  File "ibc/ibc/train_eval.py", line 279, in train_eval                                                                           
    name_scope_suffix=f'_{env_name}')                                                                                             
  File "ibc/ibc/train_eval.py", line 353, in evaluation_step                                                                          eval_actor.run()                                                                                                              
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/train/actor.py", line 149, in run                    
    self._time_step, self._policy_state)                                                                                          
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/drivers/py_driver.py", line 112, in run                  next_time_step = self.env.step(action_step.action)                                                                            
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 233, in step   
    self._current_time_step = self._step(action)                                                                                  
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/environments/wrappers.py", line 1015, in _step           time_step = self._env.step(action)                                                                                            
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 233, in step   
    self._current_time_step = self._step(action)                                                                                  
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/environments/gym_wrapper.py", line 215, in _step     
    observation, reward, self._done, self._info = self._gym_env.step(action)                                                      
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 37, in step             
    return self.env.step(action)                                                                                                  
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gym/wrappers/env_checker.py", line 39, in step                 
    return passive_env_step_check(self.env, action)                                                                               
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 273, in passive_env_st$
p_check                                                                                                                           
    if np.any(np.isnan(obs)):                                                                                                     
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types acc$
rding to the casting rule ''safe''

I'm not super familiar with tf-agents, but from some debugging it looks like obs is a dictionary type and np.isnan is having issue with it. Any thought on how one could fix this?

Thanks, Allan

opened by AllanYangZhou 2

other tasks in D4RL

Hi, Thanks for providing the implementations of your work! I want to valid IBC on the locomotion tasks in D4RL, such as hopper, halfcheetah .. But it seems like you haven't provided the relevant datasets. Are there any scripts code for converting the d4rl dataset to the tfrecords? Or the dataset links for direct downloading like the adroits :) Thanks

opened by pcheng2 0

Owner

Google Research

GitHub

Proximal Backpropagation - a neural network training algorithm that takes implicit instead of explicit gradient steps

Proximal Backpropagation Proximal Backpropagation (ProxProp) is a neural network training algorithm that takes implicit instead of explicit gradient s

40 Dec 17, 2022

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab

63 Jan 3, 2023

Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

DIGAN (ICLR 2022) Official PyTorch implementation of "Generating Videos with Dyn

147 Dec 31, 2022

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

149 Jan 8, 2023

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

235 Dec 26, 2022

Implementation of "Deep Implicit Templates for 3D Shape Representation"

Deep Implicit Templates for 3D Shape Representation Zerong Zheng, Tao Yu, Qionghai Dai, Yebin Liu. arXiv 2020. This repository is an implementation fo

144 Dec 7, 2022

Pytorch implementation of COIN, a framework for compression with implicit neural representations 🌸

COIN ?? This repo contains a Pytorch implementation of COIN: COmpression with Implicit Neural representations, including code to reproduce all experim

104 Dec 14, 2022

Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter This is a pytorch-based implementation for paper Implicit Feature Alignme

61 Nov 12, 2022

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

202 Dec 30, 2022

A PyTorch implementation of Implicit Q-Learning

IQL-PyTorch This repository houses a minimal PyTorch implementation of Implicit Q-Learning (IQL), an offline reinforcement learning algorithm, along w

30 Dec 12, 2022

Unofficial Tensorflow 2 implementation of the paper Implicit Neural Representations with Periodic Activation Functions

Siren: Implicit Neural Representations with Periodic Activation Functions The unofficial Tensorflow 2 implementation of the paper Implicit Neural Repr

2 Jun 27, 2022

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow. They have a parallel sampling feature in order to increase computation speed (especially in high-performance computing (HPC)).

3 Dec 28, 2021

Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official PyTorch implementation for "On Fast Sampling of Diffusion Probabilistic Models". FastDPM generation on CIFAR-10, CelebA, and LSUN datasets. S

68 Dec 26, 2022

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

997 Dec 30, 2022

The official implementation of the Hybrid Self-Attention NEAT algorithm

PUREPLES - Pure Python Library for ES-HyperNEAT About This is a library of evolutionary algorithms with a focus on neuroevolution, implemented in pure

91 Dec 12, 2022

DCA - Official Python implementation of Delaunay Component Analysis algorithm

Delaunay Component Analysis (DCA) Official Python implementation of the Delaunay

9 Sep 6, 2022

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

4.4k Jan 3, 2023

Learning Continuous Image Representation with Local Implicit Image Function

LIIF This repository contains the official implementation for LIIF introduced in the following paper: Learning Continuous Image Representation with Lo

1k Dec 25, 2022

Implicit Graph Neural Networks

Implicit Graph Neural Networks This repository is the official PyTorch implementation of "Implicit Graph Neural Networks". Fangda Gu*, Heng Chang*, We

48 Nov 29, 2022