An open source robotics benchmark for meta- and multi-task reinforcement learning

Reinforcement Learning Working Group

Last update: Jan 6, 2023

Related tags

Overview

Meta-World

Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. We aim to provide task distributions that are sufficiently broad to evaluate meta-RL algorithms' generalization ability to new behaviors.

For more background information, please refer to our website and the accompanying conference publication, which provides baseline results for 8 state-of-the-art meta- and multi-task RL algorithms.

Table of Contents

Installation
Using the benchmark
Citing Meta-World
Accompanying Baselines
Become a Contributor
Acknowledgements

Join the Community

Join our mailing list: [email protected] for infrequent announcements about the status of the benchmark, critical bugs and known issues before conference deadlines, and future plans, please

Need some help? Have a question which is not quite a bug and not quite a feature request?

Join the community Slack by filling out this Google Form.

Installation

Meta-World is based on MuJoCo, which has a proprietary dependency we can't set up for you. Please follow the instructions in the mujoco-py package for help. Once you're ready to install everything, run:

pip install git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld

Alternatively, you can clone the repository and install an editable version locally:

git clone https://github.com/rlworkgroup/metaworld.git
cd metaworld
pip install -e .

Using the benchmark

Here is a list of benchmark environments for meta-RL (ML*) and multi-task-RL (MT*):

ML1 is a meta-RL benchmark environment which tests few-shot adaptation to goal variation within single task. You can choose to test variation within any of 50 tasks for this benchmark.
ML10 is a meta-RL benchmark which tests few-shot adaptation to new tasks. It comprises 10 meta-train tasks, and 3 test tasks.
ML45 is a meta-RL benchmark which tests few-shot adaptation to new tasks. It comprises 45 meta-train tasks and 5 test tasks.
MT10, MT1, and MT50 are multi-task-RL benchmark environments for learning a multi-task policy that perform 10, 1, and 50 training tasks respectively. MT1 is similar to ML1 becau you can choose to test variation within any of 50 tasks for this benchmark. In the original Metaworld experiments, we augment MT10 and MT50 environment observations with a one-hot vector which identifies the task. We don't enforce how users utilize task one-hot vectors, however one solution would be to use a Gym wrapper such as this one

Basics

We provide a Benchmark API, that allows constructing environments following the gym.Env interface.

To use a Benchmark, first construct it (this samples the tasks allowed for one run of an algorithm on the benchmark). Then, construct at least one instance of each environment listed in benchmark.train_classes and benchmark.test_classes. For each of those environments, a task must be assigned to it using env.set_task(task) from benchmark.train_tasks and benchmark.test_tasks, respectively. Tasks can only be assigned to environments which have a key in benchmark.train_classes or benchmark.test_classes matching task.env_name.

Please see below for some small examples using this API.

Running ML1 or MT1

import metaworld
import random

print(metaworld.ML1.ENV_NAMES)  # Check out the available environments

ml1 = metaworld.ML1('pick-place-v1') # Construct the benchmark, sampling tasks

env = ml1.train_classes['pick-place-v1']()  # Create an environment with task `pick_place`
task = random.choice(ml1.train_tasks)
env.set_task(task)  # Set task

obs = env.reset()  # Reset environment
a = env.action_space.sample()  # Sample an action
obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

MT1 can be run the same way except that it does not contain any test_tasks

Running a benchmark

Create an environment with train tasks (ML10, MT10, ML45, or MT50):

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

training_envs = []
for name, env_cls in ml10.train_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.train_tasks
                        if task.env_name == name])
  env.set_task(task)
  training_envs.append(env)

for env in training_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

Create an environment with test tasks (this only works for ML10 and ML45, since MT10 and MT50 don't have a separate set of test tasks):

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

testing_envs = []
for name, env_cls in ml10.test_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.test_tasks
                        if task.env_name == name])
  env.set_task(task)
  testing_envs.append(env)

for env in testing_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

Citing Meta-World

If you use Meta-World for academic research, please kindly cite our CoRL 2019 paper the using following BibTeX entry.

@inproceedings{yu2019meta,
  title={Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning},
  author={Tianhe Yu and Deirdre Quillen and Zhanpeng He and Ryan Julian and Karol Hausman and Chelsea Finn and Sergey Levine},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2019}
  eprint={1910.10897},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
  url={https://arxiv.org/abs/1910.10897}
}

Accompanying Baselines

If you're looking for implementations of the baselines algorithms used in the Metaworld conference publication, please look at our sister directory, Garage. Note that these aren't the exact same baselines that were used in the original conference publication, however they are true to the original baselines.

Become a Contributor

We welcome all contributions to Meta-World. Please refer to the contributor's guide for how to prepare your contributions.

Acknowledgements

Meta-World is a work by Tianhe Yu (Stanford University), Deirdre Quillen (UC Berkeley), Zhanpeng He (Columbia University), Ryan Julian (University of Southern California), Karol Hausman (Google AI), Chelsea Finn (Stanford University) and Sergey Levine (UC Berkeley).

The code for Meta-World was originally based on multiworld, which is developed by Vitchyr H. Pong, Murtaza Dalal, Ashvin Nair, Shikhar Bahl, Steven Lin, Soroush Nasiriany, Kristian Hartikainen and Coline Devin. The Meta-World authors are grateful for their efforts on providing such a great framework as a foundation of our work. We also would like to thank Russell Mendonca for his work on reward functions for some of the environments.

Comments

All environments produce observations outside of observation space.

The following is a minimal working example which shows that all of the environments produce observations outside of their observation space. All it does is iterate over each environment from ML1, sample and set a task for the given environment, then take random actions in the environment and test whether or not the observations are inside the observation space, and at which indices (if any) an observation lies outside of the bounds of the observation space. You will get different results depending on the value of TIMESTEPS_PER_ENV, but setting this value to 1000 should yield violating observations for most environments. This is an issue, say, for RL implementations like RLlib which expect observations to be inside the observation space, and makes the environment incompatible with such libraries. This might be related to issue #31, though that issue only points out incorrect observation space boundaries regarding the goal coordinates, and the script below should point out that there are violations in other dimensions as well.

import numpy as np
from metaworld.benchmarks import ML1

TIMESTEPS_PER_ENV = 1000

def main():

    # Iterate over environment names.
    for env_name in ML1.available_tasks():

        # Create environment.
        env = ML1.get_train_tasks(env_name)
        tasks = env.sample_tasks(1)
        env.set_task(tasks[0])

        # Get boundaries of observation space and initial observation.
        low = env.observation_space.low
        high = env.observation_space.high
        obs = env.reset()

        # Create list of indices of observation space whose bounds are violated.
        broken_indices = []

        # Run environment.
        for _ in range(TIMESTEPS_PER_ENV):

            # Test if observation is outside observation space.
            if np.any(np.logical_or(obs < low, obs > high)):
                current_indices = np.argwhere(np.logical_or(obs < low, obs > high))
                current_indices = current_indices.reshape((-1,)).tolist()
                for current_index in current_indices:
                    if current_index not in broken_indices:
                        broken_indices.append(current_index)
    
            # Sample action and perform environment step.
            a = env.action_space.sample()
            obs, reward, done, info = env.step(a)

        # Print out which indices of observation space were violated.
        broken_indices = sorted(broken_indices)
        print("%s broken indices: %r" % (env_name, broken_indices))

if __name__ == "__main__":
    main()

good first issue

opened by mtcrawshaw 27

Vectorizing Envs over Many Workers Results in Memory Overflow

Currently, I'm using RLlib for running metaworld envs, where each worker runs many vectorized instances of the environment.

Tried running MAML/ProMP with 40 workers and 20 envs/worker on one of the Metaenvs (Push). Rllib can train on this for couple iterations before crashing due to memory overflow (can't allocate more memory). Not sure on what exactly is the issue, but do you have some leads on what could be the issue? I was thinking that it might be a memory leak, but trying this on a lower # of workers resulted in worse training overall but most importantly no crashing.

opened by michaelzhiluo 21
Updated Observation Space for SawyerReachPushPickPlaceEnv

Fix for issue #39 for a single environment. A detailed explanation of the logic behind these changes is found here: https://github.com/rlworkgroup/metaworld/issues/39#issuecomment-632422667

opened by adibellathur 12
ML1 Tasks for Constant Goals

Currently, we are trying to use specific environments in ML1 to set a goal constant per task in a MAML-setting (with env.reset() meaning that initial positions change but goal stays constant)

However, we are not clear on what a task means in the ML1 setting. Based on the code for one of the environments we are trying to run, it seems like calling self.set_task will update self.goal. However, when the environment is reset, self._state_goal is initially self.goal but is then assigned a randomly generated goal + a concatenation of initial reacher arm positions, which also appears to be random. When self.random_init is False, it works as intended but the starting states are constant.

We wondering if there is a way to define a task using the metaworld API such that for a given task a goal position is held constant but initial observation changes when env.reset() is called.
bug good first issue

opened by michaelzhiluo 12
Scripted policies for reach-push-pick-place environment

Hard-coded policies for the sawyer reach-push-pick-place environment. Since random_init=True, a test may fail occasionally, but that hasn't happened to me yet.

Note that these tests will fail on master, as the env is not solvable without changes from #95

Partially addresses #90

opened by haydenshively 11

Missing one-hot vector

Hello :)

The documentation mentions that MT10 and MT50 augment environment observations with a one-hot vector which identifies the task.. When I create a MT10 instance (using the code below) I do not get the task id. Could you please explain what am I missing.

import metaworld
import random

mt10 = metaworld.MT10() # Construct the benchmark, sampling tasks

training_envs = []
for name, env_cls in mt10.train_classes.items():
  env = env_cls()
  task = random.choice([task for task in mt10.train_tasks
                        if task.env_name == name])
  env.set_task(task)
  training_envs.append(env)

for env in training_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

The shape of obs is (12,) and not (13,)

opened by shagunsodhani 10

Reproducing Figure 11 and reporting success rate
Hi all and @avnishn,

I've been trying to reproduce results from Figure 11 in https://arxiv.org/pdf/1910.10897.pdf using https://github.com/rlworkgroup/garage/blob/08492007d6e2d9ead9beb83a8a4247e52019ac7d/metaworld_examples/sac_metaworld.py and hyper-parameters reported in Table 3. Should I use Table 3 for hyper-parameters?

One thing which is not clear to me is how the success rate is reported. I notice the env.step returns 'success' but want to verify here that is what reported in the paper. Here is the code the I use to report results ( random action is used for simplicity):

from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE env_cls = ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE['hammer-v2-goal-observable'] eval_env= env_cls(seed=0) eval_env.seed(0) avg_reward = 0 success_rate = 0 num_evals = 2 for _ in range(num_evals): obs = eval_env.reset() done = False stp = 0 while not done and stp < eval_env.max_path_length: obs, reward, done, info = eval_env.step(eval_env.action_space.sample()) avg_reward += reward stp += 1 if 'success' in info: success_rate += info['success'] avg_reward /= num_evals success_rate /= num_evals

Is this the right way to report the success rate like Figure 11? Thanks for your help. Rasool
opened by rasoolfa 8
Some question about metaworld environment.
I'm interested in your MetaWorld. It will be good benchmark about Meta-RL. I have some questions about this benchmark.

When start at environment, sawyer arm move to specific pose during first K steps.(K is about 10) It seems like 'init' function called by mujoco. I think this will cause problems when agent do reinforcement learning. Is this intended?

Reward gap is big between some environment. When reach env get almost 100 at first time, push env get small reward(between 0 and 1). Is this intended?

question
opened by pigranya1218 8
Remote rendering issue with env.render(mode='rgb_array') and env.get_image()
This is an issue I typically have with Mujoco-based simulations when running remotely. Basically rendering is problematic, even in rgb_array mode where we want to access the image frames.

Is there a workaround for this issue? The OpenAI gym envs have the same issue and I only know DMSuite somehow bypasses this - where I can render in rgb_array mode.

So the errors you would get are the followings :

env.render(mode='rgb_array')

GLFW error (code %d): %s 65542 b'EGL: Failed to get EGL display: Success' Creating window glfw X Error of failed request: 255 Major opcode of failed request: 155 (GLX) Minor opcode of failed request: 5 (X_GLXMakeCurrent) Serial number of failed request: 136 Current serial number in output stream: 137

env.get_image(width=84, height=84)

ERROR: GLEW initalization error: Missing GL version
opened by melfm 8
Missing Environments

If you try running scripts/demo_sawyer.py, many of the imports don't work because of missing environment such as from metaworld.envs.mujoco.sawyer_xyz.sawyer_stack import SawyerStackEnv
bug

opened by jcoreyes 8
Confuse about the success rate

I am new to metaworld! Thank you for the configuration of such meaningful and complex projects. I have some questions about the metaworld. In the project setting, does the success rate represent the success in one step or one episode? Because in your project, the agent will reach the max path length in one episode, such as 150, so does the success rate represent the average success rate in 150 steps or the success rate in the last step? Can you give me some points about it.
question

opened by jp18813100494 7

[BUG] Fix reset() method for v2

reset() method does not reset the self._prev_obs. It creates inconsistent reset state which is supposed to be deterministic for the same tasks in the same environment (by task, I mean parametric variation).

import metaworld

benchmark = metaworld.MT1("reach-v2", 0)
env = benchmark.train_classes["reach-v2"]()
env.set_task(benchmark.train_tasks[0])

reset_1 = env.reset()
reset_2 = env.reset()
_ = env.step(env.action_space.sample())
reset_3 = env.reset()
reset_4 = env.reset()

print(reset_1 == reset_2)
"""
[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True]
"""

print(reset_1 == reset_3)
"""
[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True False False False  True False False
 False False False False False  True  True  True  True  True  True  True
  True  True  True]
"""

print(reset_1 == reset_4)
"""
[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True]
"""

The following changes will return all True for reset_1 == reset_3.

opened by gunnxx 0

[BUG] ValueError during training procedure

Hi. I've been trying to reproduce the results of meta-world paper with RLLib. (Figure 11) in this paper

When sometimes during the training procedure, the value error appears. The error messages are like this.

ValueError: ('Observation ({} dtype={}) outside given space ({})!', array([ 0.54684764,  0.44018602,  0.5549992 ,  0.3511533 , -0.0165956 ,
         0.57203245,  0.01993605,  0.        ,  0.        ,  0.        ,
         1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.5476604 ,  0.44060594,
         0.55454004,  0.34979662, -0.0165956 ,  0.57203245,  0.01993605,
         0.        ,  0.        ,  0.        ,  1.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ], dtype=float32), dtype('float32'), Box([-0.525   0.348  -0.0525 -1.        -inf    -inf    -inf    -inf    -inf
     -inf    -inf    -inf    -inf    -inf    -inf    -inf    -inf    -inf
  -0.525   0.348  -0.0525 -1.        -inf    -inf    -inf    -inf    -inf
     -inf    -inf    -inf    -inf    -inf    -inf    -inf    -inf    -inf
   0.      0.      0.    ], [0.525 1.025 0.7   1.      inf   inf   inf   inf   inf   inf   inf   inf
    inf   inf   inf   inf   inf   inf 0.525 1.025 0.7   1.      inf   inf
    inf   inf   inf   inf   inf   inf   inf   inf   inf   inf   inf   inf
  0.    0.    0.   ], (39,), float32))
 

 During handling of the above exception, another exception occurred:
  Traceback (most recent call last):
   File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 725, in _worker_health_check
     ray.get(obj_ref)
   File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
     return func(*args, **kwargs)
   File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/_private/worker.py", line 2275, in get
     raise value.as_instanceof_cause()
 ray.exceptions.RayTaskError(StopIteration): ray::RolloutWorker.sample_with_count() (pid=64294, ip=163.152.162.213, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f3032a94b50>)
   File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 866, in sample_with_count
     batch = self.sample()
   File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 806, in sample
     batches = [self.input_reader.next()]
   File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
     batches = [self.get_data()]
   File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 282, in get_data
     item = next(self._env_runner)
 StopIteration

How can I fix this error? I think this error is related to this issue

[Dependencies]

python==3.9.7 torch==1.11.0+cu113 mujoco-py==2.1.2.14 mujoco == 2.1.0 ray==2.0.0

[Codes]


import metaworld
import os
import random
import numpy as np
from torch.utils.tensorboard import SummaryWriter

import ray
from ray.tune.registry import register_env
from ray.rllib.agents.ppo import PPOTrainer, PPOConfig
from ray.tune.logger import pretty_print
from custom_metric_callback import MyCallbacks
import metaworld
from metaworld.envs import (ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE,
                            ALL_V2_ENVIRONMENTS_GOAL_HIDDEN)

hidden_env_names = ['assembly-v2-goal-hidden', 'basketball-v2-goal-hidden', 'bin-picking-v2-goal-hidden', 'box-close-v2-goal-hidden',
                    'button-press-topdown-v2-goal-hidden', 'button-press-topdown-wall-v2-goal-hidden',
                    'button-press-v2-goal-hidden', 'button-press-wall-v2-goal-hidden', 'coffee-button-v2-goal-hidden',
                    'coffee-pull-v2-goal-hidden', 'coffee-push-v2-goal-hidden', 'dial-turn-v2-goal-hidden',
                    'disassemble-v2-goal-hidden', 'door-close-v2-goal-hidden', 'door-lock-v2-goal-hidden', 'door-open-v2-goal-hidden',
                    'door-unlock-v2-goal-hidden', 'hand-insert-v2-goal-hidden', 'drawer-close-v2-goal-hidden',
                    'drawer-open-v2-goal-hidden', 'faucet-open-v2-goal-hidden', 'faucet-close-v2-goal-hidden', 'hammer-v2-goal-hidden',
                    'handle-press-side-v2-goal-hidden', 'handle-press-v2-goal-hidden', 'handle-pull-side-v2-goal-hidden',
                    'handle-pull-v2-goal-hidden', 'lever-pull-v2-goal-hidden', 'peg-insert-side-v2-goal-hidden',
                    'pick-place-wall-v2-goal-hidden', 'pick-out-of-hole-v2-goal-hidden', 'reach-v2-goal-hidden',
                    'push-back-v2-goal-hidden', 'push-v2-goal-hidden', 'pick-place-v2-goal-hidden', 'plate-slide-v2-goal-hidden',
                    'plate-slide-side-v2-goal-hidden', 'plate-slide-back-v2-goal-hidden',
                    'plate-slide-back-side-v2-goal-hidden', 'peg-unplug-side-v2-goal-hidden', 'soccer-v2-goal-hidden',
                    'stick-push-v2-goal-hidden', 'stick-pull-v2-goal-hidden', 'push-wall-v2-goal-hidden', 'reach-wall-v2-goal-hidden',
                    'shelf-place-v2-goal-hidden', 'sweep-into-v2-goal-hidden', 'sweep-v2-goal-hidden', 'window-open-v2-goal-hidden',
                    'window-close-v2-goal-hidden']

observable_env_names = ['assembly-v2-goal-observable', 'basketball-v2-goal-observable', 'bin-picking-v2-goal-observable', 'box-close-v2-goal-observable',
                        'button-press-topdown-v2-goal-observable', 'button-press-topdown-wall-v2-goal-observable',
                        'button-press-v2-goal-observable', 'button-press-wall-v2-goal-observable', 'coffee-button-v2-goal-observable',
                        'coffee-pull-v2-goal-observable', 'coffee-push-v2-goal-observable', 'dial-turn-v2-goal-observable',
                        'disassemble-v2-goal-observable', 'door-close-v2-goal-observable', 'door-lock-v2-goal-observable', 'door-open-v2-goal-observable',
                        'door-unlock-v2-goal-observable', 'hand-insert-v2-goal-observable', 'drawer-close-v2-goal-observable',
                        'drawer-open-v2-goal-observable', 'faucet-open-v2-goal-observable', 'faucet-close-v2-goal-observable', 'hammer-v2-goal-observable',
                        'handle-press-side-v2-goal-observable', 'handle-press-v2-goal-observable', 'handle-pull-side-v2-goal-observable',
                        'handle-pull-v2-goal-observable', 'lever-pull-v2-goal-observable', 'peg-insert-side-v2-goal-observable',
                        'pick-place-wall-v2-goal-observable', 'pick-out-of-hole-v2-goal-observable', 'reach-v2-goal-observable',
                        'push-back-v2-goal-observable', 'push-v2-goal-observable', 'pick-place-v2-goal-observable', 'plate-slide-v2-goal-observable',
                        'plate-slide-side-v2-goal-observable', 'plate-slide-back-v2-goal-observable',
                        'plate-slide-back-side-v2-goal-observable', 'peg-unplug-side-v2-goal-observable', 'soccer-v2-goal-observable',
                        'stick-push-v2-goal-observable', 'stick-pull-v2-goal-observable', 'push-wall-v2-goal-observable', 'reach-wall-v2-goal-observable',
                        'shelf-place-v2-goal-observable', 'sweep-into-v2-goal-observable', 'sweep-v2-goal-observable', 'window-open-v2-goal-observable',
                        'window-close-v2-goal-observable']

def env_creator_hidden(env_config):
    env_name = env_config["env"]
    SEED = env_config["seed"]
    env_cls = ALL_V2_ENVIRONMENTS_GOAL_HIDDEN[env_name]
    env = env_cls(seed=SEED)
    env.seed(SEED)
    random.seed(SEED)
    return env

def env_creator_observable(env_config):
    env_name = env_config["env"]
    SEED = env_config["seed"]
    env_cls = ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_name]
    env = env_cls(seed=SEED)
    env.seed(SEED)
    random.seed(SEED)
    return env

for env_name in hidden_env_names:
    register_env(env_name, env_creator_hidden)

for env_name in observable_env_names:
    register_env(env_name, env_creator_observable)


num_gpus = 8
num_envs = len(observable_env_names)
gpu_fractions = num_gpus / num_envs

@ray.remote(num_cpus=2, num_gpus=gpu_fractions)
def distributed_trainer(env_name):
    config = PPOConfig()
    config.training(
        gamma=0.99,
        lr=0.0005,
        train_batch_size=2000,
        model={
            "fcnet_hiddens": [128, 128],
            "fcnet_activation": "tanh",
        },
        use_gae=True,
        lambda_=0.95,
        vf_loss_coeff=0.2,
        entropy_coeff=0.001,
        num_sgd_iter=5,
        sgd_minibatch_size=64,
        shuffle_sequences=True,
    )\
        .resources(
            num_gpus=1,
            num_cpus_per_worker=1,
    )\
        .framework(
            framework='torch'
    )\
        .environment(
            env=env_name,
            env_config={"env": env_name, "seed": 1}
    )\
        .rollouts(
            num_rollout_workers=2,
            num_envs_per_worker=1,
            create_env_on_local_worker=False,
            rollout_fragment_length=250,
            horizon=500,
            soft_horizon=False,
            no_done_at_end=False,
            ignore_worker_failures=True,
            recreate_failed_workers=True,
            restart_failed_sub_environments=True,
    )\
        #.callbacks(MyCallbacks)

    trainer = PPOTrainer(env=env_name, config=config)
    print(f"env_name: {env_name}")
    print("ray.get_gpu_ids(): {}".format(ray.get_gpu_ids()))
    print("CUDA_VISIBLE_DEVICES: {}".format(
        os.environ["CUDA_VISIBLE_DEVICES"]))

    for epoch in range(10000):
        result = trainer.train()
        result.pop('info')
        result.pop('sampler_results')
        if epoch % 200 == 0:
            custom_metrics = result["custom_metrics"]
            print(
                f"env_name: {env_name}, epoch: {epoch}, \n custom_metrics: {custom_metrics}")
            print(pretty_print(result))
            checkpoint = trainer.save()
            print("checkpoint saved at", checkpoint)

    return 0

distributed_trainier_refs = [distributed_trainer.remote(env_name) for env_name in hidden_env_names]
results = ray.get(distributed_trainier_refs)

distributed_trainier_refs = [distributed_trainer.remote(env_name) for env_name in observable_env_names]
results = ray.get(distributed_trainier_refs)

opened by neverparadise 0

Feature request: Online updating leaderboard

I think this would be really cool, and good to track the progress of the community. A possible service that already exists for hosting this kind of thing: https://eval.ai/web/challenges/list

opened by ezhang7423 1
Why it does not success even when the score is high?

I tested some environment with different task settings. For example on reach-v2, I think it is strange that it does not success even when the score is 4700+. I tried to add an extra score when it success, but it didn't work. Is there any tricks to slove this problem?

opened by alexxchen 6
Error when setting rand_init to False in some environments
In the V2 push-wall, pick-place-wall, push-back, and shelf-place environments, setting rand_init to False and then resetting the environment would lead to the following error when calling step():

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/mujoco_env.py", line 25, in inner return func(*args, **kwargs) File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/sawyer_xyz/v2/sawyer_push_wall_v2.py", line 79, in evaluate_state ) = self.compute_reward(action, obs) File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/sawyer_xyz/v2/sawyer_push_wall_v2.py", line 157, in compute_reward object_grasped = self._gripper_caging_reward( File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/sawyer_xyz/sawyer_xyz_env.py", line 572, in _gripper_caging_reward caging_xz_margin = np.linalg.norm(self.obj_init_pos[xz] - self.init_tcp[xz]) TypeError: list indices must be integers or slices, not list

This is because the reset() function of these environments sets self.obj_init_pos to self.adjust_initObjPos(self.init_config['obj_init_pos']) when self.random_init is False (e.g. line 112 in sawyer_push_wall_v2.py). However, self.adjust_initObjPos() returns a python list instead of a numpy array. So line 572 of swyer_xyz_env.py triggers an error by accessing a python list with a list of indices.

A simple fix is to wrap a np.array() around self.adjust_initObjPos(self.init_config['obj_init_pos']), similar to how some other environments (e.g. Push V2) handle calls to fix_extreme_obs_pos(). But there might be a more systematic way to fix this, hence the github issue instead of a pull request.
opened by zchuning 0
Incorrect reset space for object in disassemble

The lower bound for the random reset space of the object in disassemble is higher than the upper bound for the first index. This appears to be an issue for both v1 https://github.com/rlworkgroup/metaworld/blob/18118a28c06893da0f363786696cc792457b062b/metaworld/envs/mujoco/sawyer_xyz/v1/sawyer_disassemble_peg.py#L14-L15 and v2 https://github.com/rlworkgroup/metaworld/blob/18118a28c06893da0f363786696cc792457b062b/metaworld/envs/mujoco/sawyer_xyz/v2/sawyer_disassemble_peg_v2.py#L15-L16

np.random.uniform seems to invert the values in case high > low. However when using explicit seeding with self.np_random, seed = seeding.np_random(seed) self.np_random.uniform raises a ValueError.

opened by ottofabian 0

Owner

Reinforcement Learning Working Group

Coalition of researchers which develop open source reinforcement learning research software

GitHub https://meta-world.github.io/

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

2.2k Jan 5, 2023

A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.

Reinforcement Learning (PyTorch) ?? + ?? = ❤️ This repo will contain PyTorch implementation of various fundamental RL algorithms. It's aimed at making

123 Dec 23, 2022

An open source robotics benchmark for meta- and multi-task reinforcement learning

Related tags

Overview

Meta-World

Join the Community

Installation

Using the benchmark

Basics

Running ML1 or MT1

Running a benchmark

Citing Meta-World

Accompanying Baselines

Become a Contributor

Acknowledgements

Comments

[Dependencies]

[Codes]

Owner

Reinforcement Learning Working Group

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Open world survival environment for reinforcement learning

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

A toolkit for developing and comparing reinforcement learning algorithms.

Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

A toolkit for reproducible reinforcement learning research.

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Tensorforce: a TensorFlow library for applied reinforcement learning

TensorFlow Reinforcement Learning

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Deep Reinforcement Learning for Keras.

ChainerRL is a deep reinforcement learning library built on top of Chainer.

Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

This is the official implementation of Multi-Agent PPO.

A general-purpose multi-agent training framework.

A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.