RoboDesk A Multi-Task Reinforcement Learning Benchmark

Overview

RoboDesk

PyPI

A Multi-Task Reinforcement Learning Benchmark

Robodesk Banner

If you find this open source release useful, please reference in your paper:

@misc{kannan2021robodesk,
  author = {Harini Kannan and Danijar Hafner and Chelsea Finn and Dumitru Erhan},
  title = {RoboDesk: A Multi-Task Reinforcement Learning Benchmark},
  year = {2021},
  howpublished = {\url{https://github.com/google-research/robodesk}},
}

Highlights

  • Diversity: RoboDesk includes 9 diverse tasks that test for a variety of different behaviors within the same environment, making it useful for evaluating transfer, multi-task learning, and global exploration.
  • Complexity: The high-dimensional image inputs contain objects of different shapes and colors, whose initial positions are randomized to avoid naive memorization and require learning algorithms to generalize.
  • Robustness: We carefully designed and tested RoboDesk to ensure fast and stable physics simulation. This avoids objects from intersecting, getting stuck, or quickly flying away, a common problem with some existing environments.
  • Lightweight: RoboDesk comes as a self-contained Python package with few dependencies. The source code is clean and pragmatic, making it a useful blueprint for creating new MuJoCo environments.

Training Agents

Installation: pip3 install -U robodesk

The environment follows the OpenAI Gym interface:

import robodesk

env = robodesk.RoboDesk(seed=0)
obs = env.reset()
assert obs.shape == (64, 64, 3)

done = False
while not done:
  action = env.action_space.sample()
  obs, reward, done, info = env.step(action)

Tasks

Robodesk Tasks

The behaviors above were learned using the Dreamer agent. These policies have been learned from scratch and only from pixels, not proprioceptive states.

Task Description
open_slide Push the sliding door all the way to the right, navigating around the other objects.
open_drawer Pull the dark brown drawer all the way open.
push_green Push the green button to turn the green light on.
stack_blocks Stack the upright blue block on top of the flat green block.
upright_block_off_table Push the blue upright block off the table.
flat_block_in_bin Push the green flat block into the blue bin.
flat_block_in_shelf Push the green flat block into the shelf, navigating around the other blocks.
lift_upright_block Grasp the blue upright block and lift it above the table.
lift_ball Grasp the magenta ball and lift it above the table.

Environment Details

Constructor

robodesk.RoboDesk(task='open_slide', reward='dense', action_repeat=1, episode_length=500, image_size=64)
Parameter Description
task Available tasks are open_slide, open_drawer, push_green, stack, upright_block_off_table, flat_block_in_bin, flat_block_in_shelf, lift_upright_block, lift_ball.
reward Available reward types are dense, sparse, success. Success gives only the first sparse reward during the episode, useful for computing success rates during evaluation.
action_repeat Reduces the control frequency by applying each action multiple times. This is faster than using an environment wrapper because only the needed images are rendered.
episode_length Time limit for the episode, can be None.
image_size Size of the image observations in pixels, used for both height and width.

Reward

All rewards are bound between 0 and 1. There are three types of rewards available:

  • Dense rewards are based on Euclidean distances between the objects and their target positions and can include additional terms, for example to encourage the arm to reach the object. These are the easiest rewards for learning.
  • Sparse rewards are either 0 or 1 based on whether the target object is in the target area or not, according to a fixed threshold. Learning from sparse rewards is more challenging.
  • Success rewards are equivalent to the sparse rewards, except that only the first reward is given during each episode. As a result, an episode return of 0 means failure and 1 means sucess at the task. This should only be used during evaluation.

Termination

Episodes end after 500 time steps by default. There are no early terminations.

Observation Space

Each observation is a dictionary that contains the current image, as well as additional information. For the standard benchmark, only the image should be used for learning. The observation dictionary contains the following keys:

Key Space
image Box(0, 255, (64, 64, 3), np.uint8)
qpos_robot Box(-np.inf, np.inf, (9,), np.float32)
qvel_robot Box(-np.inf, np.inf, (9,), np.float32)
qpos_objects Box(-np.inf, np.inf, (26,), np.float32)
qvel_objects Box(-np.inf, np.inf, (26,), np.float32)
end_effector Box(-np.inf, np.inf, (3,), np.float32)

Action Space

RoboDesk uses end effector control with a simple bounded action space:

Box(-1, 1, (5,), np.float32)

Acknowledgements

We thank Ben Eysenbach and Debidatta Dwibedi for their helpful feedback.

Our benchmark builds upon previously open-sourced work. We build upon the desk XMLs first introduced in [1], the Franka XMLs open-sourced in [2], and the Franka meshes open-sourced in [3].

Questions

Please open an issue on Github.

Disclaimer: This is not an official Google product.

You might also like...
Repository for
Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

Improving evidential deep learning via multi task learning It is a repository of AAAI2022 paper, “Improving evidential deep learning via multi-task le

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

A list of multi-task learning papers and projects.

This page contains a list of papers on multi-task learning for computer vision. Please create a pull request if you wish to add anything. If you are interested, consider reading our recent survey paper.

A list of multi-task learning papers and projects.

A list of multi-task learning papers and projects.

Multi-task Learning of Order-Consistent Causal Graphs (NeuRIPs 2021)

Multi-task Learning of Order-Consistent Causal Graphs (NeuRIPs 2021) Authors: Xinshi Chen, Haoran Sun, Caleb Ellington, Eric Xing, Le Song Link to pap

MultiTaskLearning - Multi Task Learning for 3D segmentation
MultiTaskLearning - Multi Task Learning for 3D segmentation

Multi Task Learning for 3D segmentation Perception stack of an Autonomous Drivin

FocusFace: Multi-task Contrastive Learning for Masked Face Recognition
FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

FocusFace This is the official repository of "FocusFace: Multi-task Contrastive Learning for Masked Face Recognition" accepted at IEEE International C

Multi-Task Learning as a Bargaining Game

Nash-MTL Official implementation of "Multi-Task Learning as a Bargaining Game". Setup environment conda create -n nashmtl python=3.9.7 conda activate

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Comments
  • Adds noise distractor options

    Adds noise distractor options

    The main mechanism to add distractors is utils.EnvElementManager objects, each of which conceptually manages the evolution of a certain element. Its methods reset(), step(), and pre_render() are called in main environment respective places. This allows placing all code related to a single distractor in a single place, making it much more readable.

    Additionally, the RoboDesk class is broken into a base class RoboDeskBase and subclasses RoboDesk and RoboDeskWithTV. The base class has very detailed parameters for distractors. The subclasses

    1. Specify different xmls (most notably one xml has TV, and one does not), and
    2. Use the distractor parameters appropriate for the corresponding scene.
    3. Provide with an easy-to-use API for distractors: RoboDesk(..., distractor="all") or ="none" or ={'env_light', 'camera'}.
    4. Have different camera viewpoints, specified via CameraSpec.

    Notably among EnvElementManagers:

    • The CameraManager is created from CameraSpec and is the definitive entry point to create a rendered view of the scene (as it manages camera location).
    • The ButtonManager models potential noisy button sensor readings, so elements that change wrt them (i.e., indicator lights on desk, and TV hue) should read from its method get_normalized_button.
    • The TVManager loads video files and updates frames to the mujoco model.

    This PR is built on top of https://github.com/google-research/robodesk/pull/3. Let me know how I can better facilitate review.

    Finally, if this gets merged, could I be added to readme/acknowledgement/bibtex so that I can help with potential issues people have with the distractor options?

    opened by ssnl 0
  • Are there baseline agents?

    Are there baseline agents?

    Hi,

    Thank you for open-sourcing the environment. I was wondering if you have pointers to implementations of baseline agents of these environments?

    I understand that there are visualizations of the Dreamer agent, but when I went to the Dreamer agent codebase, if I am not mistaken, there is not already an option to run the Dreamer agent on the RoboDesk tasks.

    Thank you!

    opened by quanvuong 0
  • RoboDesk does not have argument seed

    RoboDesk does not have argument seed

    Hi,

    Thanks for the great project!

    I try to follow the README but it seems that seed is not a valid argument. Running robodesk.RoboDesk(seed=0) will throw an error.

    Best, Yicheng

    opened by ethanluoyc 0
Owner
Google Research
Google Research
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 7, 2022
Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

Manav Mishra 4 Apr 15, 2022
The Unsupervised Reinforcement Learning Benchmark (URLB)

The Unsupervised Reinforcement Learning Benchmark (URLB) URLB provides a set of leading algorithms for unsupervised reinforcement learning where agent

null 259 Dec 26, 2022
A Real-World Benchmark for Reinforcement Learning based Recommender System

RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System RL4RS is a real-world deep reinforcement learning recommender system

null 121 Dec 1, 2022
Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

RuanJingqing 8 Sep 30, 2022
Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

TGraM Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling, Qibin He, Xian Sun, Zhiyuan Yan, Beibei Li, Kun Fu Abstract Rece

Qibin He 6 Nov 25, 2022
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning (NeurIPS 2020) Introduction AdaShare is a novel and differentiable approach fo

null 94 Dec 22, 2022
Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

AI Secure 57 Dec 15, 2022