An open source robotics benchmark for meta- and multi-task reinforcement learning

Overview

Meta-World

License Build Status

Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. We aim to provide task distributions that are sufficiently broad to evaluate meta-RL algorithms' generalization ability to new behaviors.

For more background information, please refer to our website and the accompanying conference publication, which provides baseline results for 8 state-of-the-art meta- and multi-task RL algorithms.

Table of Contents

Join the Community

Join our mailing list: [email protected] for infrequent announcements about the status of the benchmark, critical bugs and known issues before conference deadlines, and future plans, please

Need some help? Have a question which is not quite a bug and not quite a feature request?

Join the community Slack by filling out this Google Form.

Installation

Meta-World is based on MuJoCo, which has a proprietary dependency we can't set up for you. Please follow the instructions in the mujoco-py package for help. Once you're ready to install everything, run:

pip install git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld

Alternatively, you can clone the repository and install an editable version locally:

git clone https://github.com/rlworkgroup/metaworld.git
cd metaworld
pip install -e .

Using the benchmark

Here is a list of benchmark environments for meta-RL (ML*) and multi-task-RL (MT*):

  • ML1 is a meta-RL benchmark environment which tests few-shot adaptation to goal variation within single task. You can choose to test variation within any of 50 tasks for this benchmark.
  • ML10 is a meta-RL benchmark which tests few-shot adaptation to new tasks. It comprises 10 meta-train tasks, and 3 test tasks.
  • ML45 is a meta-RL benchmark which tests few-shot adaptation to new tasks. It comprises 45 meta-train tasks and 5 test tasks.
  • MT10, MT1, and MT50 are multi-task-RL benchmark environments for learning a multi-task policy that perform 10, 1, and 50 training tasks respectively. MT1 is similar to ML1 becau you can choose to test variation within any of 50 tasks for this benchmark. In the original Metaworld experiments, we augment MT10 and MT50 environment observations with a one-hot vector which identifies the task. We don't enforce how users utilize task one-hot vectors, however one solution would be to use a Gym wrapper such as this one

Basics

We provide a Benchmark API, that allows constructing environments following the gym.Env interface.

To use a Benchmark, first construct it (this samples the tasks allowed for one run of an algorithm on the benchmark). Then, construct at least one instance of each environment listed in benchmark.train_classes and benchmark.test_classes. For each of those environments, a task must be assigned to it using env.set_task(task) from benchmark.train_tasks and benchmark.test_tasks, respectively. Tasks can only be assigned to environments which have a key in benchmark.train_classes or benchmark.test_classes matching task.env_name.

Please see below for some small examples using this API.

Running ML1 or MT1

import metaworld
import random

print(metaworld.ML1.ENV_NAMES)  # Check out the available environments

ml1 = metaworld.ML1('pick-place-v1') # Construct the benchmark, sampling tasks

env = ml1.train_classes['pick-place-v1']()  # Create an environment with task `pick_place`
task = random.choice(ml1.train_tasks)
env.set_task(task)  # Set task

obs = env.reset()  # Reset environment
a = env.action_space.sample()  # Sample an action
obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

MT1 can be run the same way except that it does not contain any test_tasks

Running a benchmark

Create an environment with train tasks (ML10, MT10, ML45, or MT50):

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

training_envs = []
for name, env_cls in ml10.train_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.train_tasks
                        if task.env_name == name])
  env.set_task(task)
  training_envs.append(env)

for env in training_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

Create an environment with test tasks (this only works for ML10 and ML45, since MT10 and MT50 don't have a separate set of test tasks):

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

testing_envs = []
for name, env_cls in ml10.test_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.test_tasks
                        if task.env_name == name])
  env.set_task(task)
  testing_envs.append(env)

for env in testing_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

Citing Meta-World

If you use Meta-World for academic research, please kindly cite our CoRL 2019 paper the using following BibTeX entry.

@inproceedings{yu2019meta,
  title={Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning},
  author={Tianhe Yu and Deirdre Quillen and Zhanpeng He and Ryan Julian and Karol Hausman and Chelsea Finn and Sergey Levine},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2019}
  eprint={1910.10897},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
  url={https://arxiv.org/abs/1910.10897}
}

Accompanying Baselines

If you're looking for implementations of the baselines algorithms used in the Metaworld conference publication, please look at our sister directory, Garage. Note that these aren't the exact same baselines that were used in the original conference publication, however they are true to the original baselines.

Become a Contributor

We welcome all contributions to Meta-World. Please refer to the contributor's guide for how to prepare your contributions.

Acknowledgements

Meta-World is a work by Tianhe Yu (Stanford University), Deirdre Quillen (UC Berkeley), Zhanpeng He (Columbia University), Ryan Julian (University of Southern California), Karol Hausman (Google AI), Chelsea Finn (Stanford University) and Sergey Levine (UC Berkeley).

The code for Meta-World was originally based on multiworld, which is developed by Vitchyr H. Pong, Murtaza Dalal, Ashvin Nair, Shikhar Bahl, Steven Lin, Soroush Nasiriany, Kristian Hartikainen and Coline Devin. The Meta-World authors are grateful for their efforts on providing such a great framework as a foundation of our work. We also would like to thank Russell Mendonca for his work on reward functions for some of the environments.

Comments
  • All environments produce observations outside of observation space.

    All environments produce observations outside of observation space.

    The following is a minimal working example which shows that all of the environments produce observations outside of their observation space. All it does is iterate over each environment from ML1, sample and set a task for the given environment, then take random actions in the environment and test whether or not the observations are inside the observation space, and at which indices (if any) an observation lies outside of the bounds of the observation space. You will get different results depending on the value of TIMESTEPS_PER_ENV, but setting this value to 1000 should yield violating observations for most environments. This is an issue, say, for RL implementations like RLlib which expect observations to be inside the observation space, and makes the environment incompatible with such libraries. This might be related to issue #31, though that issue only points out incorrect observation space boundaries regarding the goal coordinates, and the script below should point out that there are violations in other dimensions as well.

    import numpy as np
    from metaworld.benchmarks import ML1
    
    TIMESTEPS_PER_ENV = 1000
    
    def main():
    
        # Iterate over environment names.
        for env_name in ML1.available_tasks():
    
            # Create environment.
            env = ML1.get_train_tasks(env_name)
            tasks = env.sample_tasks(1)
            env.set_task(tasks[0])
    
            # Get boundaries of observation space and initial observation.
            low = env.observation_space.low
            high = env.observation_space.high
            obs = env.reset()
    
            # Create list of indices of observation space whose bounds are violated.
            broken_indices = []
    
            # Run environment.
            for _ in range(TIMESTEPS_PER_ENV):
    
                # Test if observation is outside observation space.
                if np.any(np.logical_or(obs < low, obs > high)):
                    current_indices = np.argwhere(np.logical_or(obs < low, obs > high))
                    current_indices = current_indices.reshape((-1,)).tolist()
                    for current_index in current_indices:
                        if current_index not in broken_indices:
                            broken_indices.append(current_index)
        
                # Sample action and perform environment step.
                a = env.action_space.sample()
                obs, reward, done, info = env.step(a)
    
            # Print out which indices of observation space were violated.
            broken_indices = sorted(broken_indices)
            print("%s broken indices: %r" % (env_name, broken_indices))
    
    if __name__ == "__main__":
        main()
    
    good first issue 
    opened by mtcrawshaw 27
  • Vectorizing Envs over Many Workers Results in Memory Overflow

    Vectorizing Envs over Many Workers Results in Memory Overflow

    Currently, I'm using RLlib for running metaworld envs, where each worker runs many vectorized instances of the environment.

    Tried running MAML/ProMP with 40 workers and 20 envs/worker on one of the Metaenvs (Push). Rllib can train on this for couple iterations before crashing due to memory overflow (can't allocate more memory). Not sure on what exactly is the issue, but do you have some leads on what could be the issue? I was thinking that it might be a memory leak, but trying this on a lower # of workers resulted in worse training overall but most importantly no crashing.

    opened by michaelzhiluo 21
  • Updated Observation Space for SawyerReachPushPickPlaceEnv

    Updated Observation Space for SawyerReachPushPickPlaceEnv

    Fix for issue #39 for a single environment. A detailed explanation of the logic behind these changes is found here: https://github.com/rlworkgroup/metaworld/issues/39#issuecomment-632422667

    opened by adibellathur 12
  • ML1 Tasks for Constant Goals

    ML1 Tasks for Constant Goals

    Currently, we are trying to use specific environments in ML1 to set a goal constant per task in a MAML-setting (with env.reset() meaning that initial positions change but goal stays constant)

    However, we are not clear on what a task means in the ML1 setting. Based on the code for one of the environments we are trying to run, it seems like calling self.set_task will update self.goal. However, when the environment is reset, self._state_goal is initially self.goal but is then assigned a randomly generated goal + a concatenation of initial reacher arm positions, which also appears to be random. When self.random_init is False, it works as intended but the starting states are constant.

    We wondering if there is a way to define a task using the metaworld API such that for a given task a goal position is held constant but initial observation changes when env.reset() is called.

    bug good first issue 
    opened by michaelzhiluo 12
  • Scripted policies for reach-push-pick-place environment

    Scripted policies for reach-push-pick-place environment

    Hard-coded policies for the sawyer reach-push-pick-place environment. Since random_init=True, a test may fail occasionally, but that hasn't happened to me yet.

    Note that these tests will fail on master, as the env is not solvable without changes from #95

    Partially addresses #90

    opened by haydenshively 11
  • Missing one-hot vector

    Missing one-hot vector

    Hello :)

    The documentation mentions that MT10 and MT50 augment environment observations with a one-hot vector which identifies the task.. When I create a MT10 instance (using the code below) I do not get the task id. Could you please explain what am I missing.

    import metaworld
    import random
    
    mt10 = metaworld.MT10() # Construct the benchmark, sampling tasks
    
    training_envs = []
    for name, env_cls in mt10.train_classes.items():
      env = env_cls()
      task = random.choice([task for task in mt10.train_tasks
                            if task.env_name == name])
      env.set_task(task)
      training_envs.append(env)
    
    for env in training_envs:
      obs = env.reset()  # Reset environment
      a = env.action_space.sample()  # Sample an action
      obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action
    

    The shape of obs is (12,) and not (13,)

    opened by shagunsodhani 10
  • Reproducing Figure 11 and reporting success rate

    Reproducing Figure 11 and reporting success rate

    Hi all and @avnishn,

    I've been trying to reproduce results from Figure 11 in https://arxiv.org/pdf/1910.10897.pdf using https://github.com/rlworkgroup/garage/blob/08492007d6e2d9ead9beb83a8a4247e52019ac7d/metaworld_examples/sac_metaworld.py and hyper-parameters reported in Table 3. Should I use Table 3 for hyper-parameters?

    One thing which is not clear to me is how the success rate is reported. I notice the env.step returns 'success' but want to verify here that is what reported in the paper. Here is the code the I use to report results ( random action is used for simplicity):

    from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
    env_cls = ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE['hammer-v2-goal-observable']
    eval_env= env_cls(seed=0)
    eval_env.seed(0)
    avg_reward = 0 
    success_rate = 0 
    num_evals = 2
    
    for _ in range(num_evals):
        obs = eval_env.reset()
        done = False
        stp = 0
        while not done and stp < eval_env.max_path_length:
            obs, reward, done, info = eval_env.step(eval_env.action_space.sample())
            avg_reward += reward
            stp += 1
            if 'success' in info:
                success_rate += info['success']
    avg_reward /= num_evals
    success_rate /= num_evals
    

    Is this the right way to report the success rate like Figure 11? Thanks for your help. Rasool

    opened by rasoolfa 8
  • Some question about metaworld environment.

    Some question about metaworld environment.

    I'm interested in your MetaWorld. It will be good benchmark about Meta-RL. I have some questions about this benchmark.

    1. When start at environment, sawyer arm move to specific pose during first K steps.(K is about 10) It seems like 'init' function called by mujoco. I think this will cause problems when agent do reinforcement learning. Is this intended?

    2. Reward gap is big between some environment. When reach env get almost 100 at first time, push env get small reward(between 0 and 1). Is this intended?

    question 
    opened by pigranya1218 8
  • Remote rendering issue with env.render(mode='rgb_array') and env.get_image()

    Remote rendering issue with env.render(mode='rgb_array') and env.get_image()

    This is an issue I typically have with Mujoco-based simulations when running remotely. Basically rendering is problematic, even in rgb_array mode where we want to access the image frames.

    Is there a workaround for this issue? The OpenAI gym envs have the same issue and I only know DMSuite somehow bypasses this - where I can render in rgb_array mode.

    So the errors you would get are the followings :

    • env.render(mode='rgb_array')
    GLFW error (code %d): %s 65542 b'EGL: Failed to get EGL display: Success'
    Creating window glfw
    X Error of failed request:  255
      Major opcode of failed request:  155 (GLX)
      Minor opcode of failed request:  5 (X_GLXMakeCurrent)
      Serial number of failed request:  136
      Current serial number in output stream:  137
    
    • env.get_image(width=84, height=84)
    ERROR: GLEW initalization error: Missing GL version
    
    opened by melfm 8
  • Missing Environments

    Missing Environments

    If you try running scripts/demo_sawyer.py, many of the imports don't work because of missing environment such as from metaworld.envs.mujoco.sawyer_xyz.sawyer_stack import SawyerStackEnv

    bug 
    opened by jcoreyes 8
  • Confuse about the success rate

    Confuse about the success rate

    I am new to metaworld! Thank you for the configuration of such meaningful and complex projects. I have some questions about the metaworld. In the project setting, does the success rate represent the success in one step or one episode? Because in your project, the agent will reach the max path length in one episode, such as 150, so does the success rate represent the average success rate in 150 steps or the success rate in the last step? Can you give me some points about it.

    question 
    opened by jp18813100494 7
  • [BUG] Fix reset() method for v2

    [BUG] Fix reset() method for v2

    reset() method does not reset the self._prev_obs. It creates inconsistent reset state which is supposed to be deterministic for the same tasks in the same environment (by task, I mean parametric variation).

    import metaworld
    
    benchmark = metaworld.MT1("reach-v2", 0)
    env = benchmark.train_classes["reach-v2"]()
    env.set_task(benchmark.train_tasks[0])
    
    reset_1 = env.reset()
    reset_2 = env.reset()
    _ = env.step(env.action_space.sample())
    reset_3 = env.reset()
    reset_4 = env.reset()
    
    print(reset_1 == reset_2)
    """
    [ True  True  True  True  True  True  True  True  True  True  True  True
      True  True  True  True  True  True  True  True  True  True  True  True
      True  True  True  True  True  True  True  True  True  True  True  True
      True  True  True]
    """
    
    print(reset_1 == reset_3)
    """
    [ True  True  True  True  True  True  True  True  True  True  True  True
      True  True  True  True  True  True False False False  True False False
     False False False False False  True  True  True  True  True  True  True
      True  True  True]
    """
    
    print(reset_1 == reset_4)
    """
    [ True  True  True  True  True  True  True  True  True  True  True  True
      True  True  True  True  True  True  True  True  True  True  True  True
      True  True  True  True  True  True  True  True  True  True  True  True
      True  True  True]
    """
    

    The following changes will return all True for reset_1 == reset_3.

    opened by gunnxx 0
  • [BUG] ValueError during training procedure

    [BUG] ValueError during training procedure

    Hi. I've been trying to reproduce the results of meta-world paper with RLLib. (Figure 11) in this paper

    When sometimes during the training procedure, the value error appears. The error messages are like this.

    ValueError: ('Observation ({} dtype={}) outside given space ({})!', array([ 0.54684764,  0.44018602,  0.5549992 ,  0.3511533 , -0.0165956 ,
             0.57203245,  0.01993605,  0.        ,  0.        ,  0.        ,
             1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.        ,  0.        ,  0.5476604 ,  0.44060594,
             0.55454004,  0.34979662, -0.0165956 ,  0.57203245,  0.01993605,
             0.        ,  0.        ,  0.        ,  1.        ,  0.        ,
             0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.        ,  0.        ,  0.        ], dtype=float32), dtype('float32'), Box([-0.525   0.348  -0.0525 -1.        -inf    -inf    -inf    -inf    -inf
         -inf    -inf    -inf    -inf    -inf    -inf    -inf    -inf    -inf
      -0.525   0.348  -0.0525 -1.        -inf    -inf    -inf    -inf    -inf
         -inf    -inf    -inf    -inf    -inf    -inf    -inf    -inf    -inf
       0.      0.      0.    ], [0.525 1.025 0.7   1.      inf   inf   inf   inf   inf   inf   inf   inf
        inf   inf   inf   inf   inf   inf 0.525 1.025 0.7   1.      inf   inf
        inf   inf   inf   inf   inf   inf   inf   inf   inf   inf   inf   inf
      0.    0.    0.   ], (39,), float32))
     
    
     During handling of the above exception, another exception occurred:
      Traceback (most recent call last):
       File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 725, in _worker_health_check
         ray.get(obj_ref)
       File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
         return func(*args, **kwargs)
       File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/_private/worker.py", line 2275, in get
         raise value.as_instanceof_cause()
     ray.exceptions.RayTaskError(StopIteration): ray::RolloutWorker.sample_with_count() (pid=64294, ip=163.152.162.213, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f3032a94b50>)
       File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 866, in sample_with_count
         batch = self.sample()
       File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 806, in sample
         batches = [self.input_reader.next()]
       File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
         batches = [self.get_data()]
       File "/opt/anaconda3/envs/metarl2/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 282, in get_data
         item = next(self._env_runner)
     StopIteration
    

    How can I fix this error? I think this error is related to this issue

    [Dependencies]

    python==3.9.7 torch==1.11.0+cu113 mujoco-py==2.1.2.14 mujoco == 2.1.0 ray==2.0.0

    [Codes]

    
    import metaworld
    import os
    import random
    import numpy as np
    from torch.utils.tensorboard import SummaryWriter
    
    import ray
    from ray.tune.registry import register_env
    from ray.rllib.agents.ppo import PPOTrainer, PPOConfig
    from ray.tune.logger import pretty_print
    from custom_metric_callback import MyCallbacks
    import metaworld
    from metaworld.envs import (ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE,
                                ALL_V2_ENVIRONMENTS_GOAL_HIDDEN)
    
    hidden_env_names = ['assembly-v2-goal-hidden', 'basketball-v2-goal-hidden', 'bin-picking-v2-goal-hidden', 'box-close-v2-goal-hidden',
                        'button-press-topdown-v2-goal-hidden', 'button-press-topdown-wall-v2-goal-hidden',
                        'button-press-v2-goal-hidden', 'button-press-wall-v2-goal-hidden', 'coffee-button-v2-goal-hidden',
                        'coffee-pull-v2-goal-hidden', 'coffee-push-v2-goal-hidden', 'dial-turn-v2-goal-hidden',
                        'disassemble-v2-goal-hidden', 'door-close-v2-goal-hidden', 'door-lock-v2-goal-hidden', 'door-open-v2-goal-hidden',
                        'door-unlock-v2-goal-hidden', 'hand-insert-v2-goal-hidden', 'drawer-close-v2-goal-hidden',
                        'drawer-open-v2-goal-hidden', 'faucet-open-v2-goal-hidden', 'faucet-close-v2-goal-hidden', 'hammer-v2-goal-hidden',
                        'handle-press-side-v2-goal-hidden', 'handle-press-v2-goal-hidden', 'handle-pull-side-v2-goal-hidden',
                        'handle-pull-v2-goal-hidden', 'lever-pull-v2-goal-hidden', 'peg-insert-side-v2-goal-hidden',
                        'pick-place-wall-v2-goal-hidden', 'pick-out-of-hole-v2-goal-hidden', 'reach-v2-goal-hidden',
                        'push-back-v2-goal-hidden', 'push-v2-goal-hidden', 'pick-place-v2-goal-hidden', 'plate-slide-v2-goal-hidden',
                        'plate-slide-side-v2-goal-hidden', 'plate-slide-back-v2-goal-hidden',
                        'plate-slide-back-side-v2-goal-hidden', 'peg-unplug-side-v2-goal-hidden', 'soccer-v2-goal-hidden',
                        'stick-push-v2-goal-hidden', 'stick-pull-v2-goal-hidden', 'push-wall-v2-goal-hidden', 'reach-wall-v2-goal-hidden',
                        'shelf-place-v2-goal-hidden', 'sweep-into-v2-goal-hidden', 'sweep-v2-goal-hidden', 'window-open-v2-goal-hidden',
                        'window-close-v2-goal-hidden']
    
    observable_env_names = ['assembly-v2-goal-observable', 'basketball-v2-goal-observable', 'bin-picking-v2-goal-observable', 'box-close-v2-goal-observable',
                            'button-press-topdown-v2-goal-observable', 'button-press-topdown-wall-v2-goal-observable',
                            'button-press-v2-goal-observable', 'button-press-wall-v2-goal-observable', 'coffee-button-v2-goal-observable',
                            'coffee-pull-v2-goal-observable', 'coffee-push-v2-goal-observable', 'dial-turn-v2-goal-observable',
                            'disassemble-v2-goal-observable', 'door-close-v2-goal-observable', 'door-lock-v2-goal-observable', 'door-open-v2-goal-observable',
                            'door-unlock-v2-goal-observable', 'hand-insert-v2-goal-observable', 'drawer-close-v2-goal-observable',
                            'drawer-open-v2-goal-observable', 'faucet-open-v2-goal-observable', 'faucet-close-v2-goal-observable', 'hammer-v2-goal-observable',
                            'handle-press-side-v2-goal-observable', 'handle-press-v2-goal-observable', 'handle-pull-side-v2-goal-observable',
                            'handle-pull-v2-goal-observable', 'lever-pull-v2-goal-observable', 'peg-insert-side-v2-goal-observable',
                            'pick-place-wall-v2-goal-observable', 'pick-out-of-hole-v2-goal-observable', 'reach-v2-goal-observable',
                            'push-back-v2-goal-observable', 'push-v2-goal-observable', 'pick-place-v2-goal-observable', 'plate-slide-v2-goal-observable',
                            'plate-slide-side-v2-goal-observable', 'plate-slide-back-v2-goal-observable',
                            'plate-slide-back-side-v2-goal-observable', 'peg-unplug-side-v2-goal-observable', 'soccer-v2-goal-observable',
                            'stick-push-v2-goal-observable', 'stick-pull-v2-goal-observable', 'push-wall-v2-goal-observable', 'reach-wall-v2-goal-observable',
                            'shelf-place-v2-goal-observable', 'sweep-into-v2-goal-observable', 'sweep-v2-goal-observable', 'window-open-v2-goal-observable',
                            'window-close-v2-goal-observable']
    
    def env_creator_hidden(env_config):
        env_name = env_config["env"]
        SEED = env_config["seed"]
        env_cls = ALL_V2_ENVIRONMENTS_GOAL_HIDDEN[env_name]
        env = env_cls(seed=SEED)
        env.seed(SEED)
        random.seed(SEED)
        return env
    
    def env_creator_observable(env_config):
        env_name = env_config["env"]
        SEED = env_config["seed"]
        env_cls = ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_name]
        env = env_cls(seed=SEED)
        env.seed(SEED)
        random.seed(SEED)
        return env
    
    for env_name in hidden_env_names:
        register_env(env_name, env_creator_hidden)
    
    for env_name in observable_env_names:
        register_env(env_name, env_creator_observable)
    
    
    num_gpus = 8
    num_envs = len(observable_env_names)
    gpu_fractions = num_gpus / num_envs
    
    @ray.remote(num_cpus=2, num_gpus=gpu_fractions)
    def distributed_trainer(env_name):
        config = PPOConfig()
        config.training(
            gamma=0.99,
            lr=0.0005,
            train_batch_size=2000,
            model={
                "fcnet_hiddens": [128, 128],
                "fcnet_activation": "tanh",
            },
            use_gae=True,
            lambda_=0.95,
            vf_loss_coeff=0.2,
            entropy_coeff=0.001,
            num_sgd_iter=5,
            sgd_minibatch_size=64,
            shuffle_sequences=True,
        )\
            .resources(
                num_gpus=1,
                num_cpus_per_worker=1,
        )\
            .framework(
                framework='torch'
        )\
            .environment(
                env=env_name,
                env_config={"env": env_name, "seed": 1}
        )\
            .rollouts(
                num_rollout_workers=2,
                num_envs_per_worker=1,
                create_env_on_local_worker=False,
                rollout_fragment_length=250,
                horizon=500,
                soft_horizon=False,
                no_done_at_end=False,
                ignore_worker_failures=True,
                recreate_failed_workers=True,
                restart_failed_sub_environments=True,
        )\
            #.callbacks(MyCallbacks)
    
        trainer = PPOTrainer(env=env_name, config=config)
        print(f"env_name: {env_name}")
        print("ray.get_gpu_ids(): {}".format(ray.get_gpu_ids()))
        print("CUDA_VISIBLE_DEVICES: {}".format(
            os.environ["CUDA_VISIBLE_DEVICES"]))
    
        for epoch in range(10000):
            result = trainer.train()
            result.pop('info')
            result.pop('sampler_results')
            if epoch % 200 == 0:
                custom_metrics = result["custom_metrics"]
                print(
                    f"env_name: {env_name}, epoch: {epoch}, \n custom_metrics: {custom_metrics}")
                print(pretty_print(result))
                checkpoint = trainer.save()
                print("checkpoint saved at", checkpoint)
    
        return 0
    
    distributed_trainier_refs = [distributed_trainer.remote(env_name) for env_name in hidden_env_names]
    results = ray.get(distributed_trainier_refs)
    
    distributed_trainier_refs = [distributed_trainer.remote(env_name) for env_name in observable_env_names]
    results = ray.get(distributed_trainier_refs)
    
    
    opened by neverparadise 0
  • Feature request: Online updating leaderboard

    Feature request: Online updating leaderboard

    I think this would be really cool, and good to track the progress of the community. A possible service that already exists for hosting this kind of thing: https://eval.ai/web/challenges/list

    opened by ezhang7423 1
  • Why it does not success even when the score is high?

    Why it does not success even when the score is high?

    I tested some environment with different task settings. For example on reach-v2, I think it is strange that it does not success even when the score is 4700+. I tried to add an extra score when it success, but it didn't work. Is there any tricks to slove this problem?

    opened by alexxchen 6
  • Error when setting rand_init to False in some environments

    Error when setting rand_init to False in some environments

    In the V2 push-wall, pick-place-wall, push-back, and shelf-place environments, setting rand_init to False and then resetting the environment would lead to the following error when calling step():

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/mujoco_env.py", line 25, in inner
        return func(*args, **kwargs)
      File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/sawyer_xyz/v2/sawyer_push_wall_v2.py", line 79, in evaluate_state
        ) = self.compute_reward(action, obs)
      File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/sawyer_xyz/v2/sawyer_push_wall_v2.py", line 157, in compute_reward
        object_grasped = self._gripper_caging_reward(
      File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/sawyer_xyz/sawyer_xyz_env.py", line 572, in _gripper_caging_reward
        caging_xz_margin = np.linalg.norm(self.obj_init_pos[xz] - self.init_tcp[xz])
    TypeError: list indices must be integers or slices, not list
    

    This is because the reset() function of these environments sets self.obj_init_pos to self.adjust_initObjPos(self.init_config['obj_init_pos']) when self.random_init is False (e.g. line 112 in sawyer_push_wall_v2.py). However, self.adjust_initObjPos() returns a python list instead of a numpy array. So line 572 of swyer_xyz_env.py triggers an error by accessing a python list with a list of indices.

    A simple fix is to wrap a np.array() around self.adjust_initObjPos(self.init_config['obj_init_pos']), similar to how some other environments (e.g. Push V2) handle calls to fix_extreme_obs_pos(). But there might be a more systematic way to fix this, hence the github issue instead of a pull request.

    opened by zchuning 0
  • Incorrect reset space for object in disassemble

    Incorrect reset space for object in disassemble

    The lower bound for the random reset space of the object in disassemble is higher than the upper bound for the first index. This appears to be an issue for both v1 https://github.com/rlworkgroup/metaworld/blob/18118a28c06893da0f363786696cc792457b062b/metaworld/envs/mujoco/sawyer_xyz/v1/sawyer_disassemble_peg.py#L14-L15 and v2 https://github.com/rlworkgroup/metaworld/blob/18118a28c06893da0f363786696cc792457b062b/metaworld/envs/mujoco/sawyer_xyz/v2/sawyer_disassemble_peg_v2.py#L15-L16

    np.random.uniform seems to invert the values in case high > low. However when using explicit seeding with self.np_random, seed = seeding.np_random(seed) self.np_random.uniform raises a ValueError.

    opened by ottofabian 0
Owner
Reinforcement Learning Working Group
Coalition of researchers which develop open source reinforcement learning research software
Reinforcement Learning Working Group
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Intel Labs 2.2k Jan 5, 2023
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

null 404 Dec 25, 2022
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Danijar Hafner 213 Jan 5, 2023
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

null 2.4k Dec 29, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 1, 2023
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Marek Wydmuch 1.5k Dec 30, 2022
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Reinforcement Learning Working Group 1.6k Jan 9, 2023
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

OpenAI 13.5k Jan 7, 2023
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Ashley Hill 3.7k Jan 1, 2023
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Facebook Research 3.3k Jan 5, 2023
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 2, 2023
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

DeepMind 3.1k Dec 29, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 7, 2023
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 5.4k Jan 4, 2023
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Chainer 1.1k Dec 26, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

AgentMaker 117 Dec 12, 2022
This is the official implementation of Multi-Agent PPO.

MAPPO Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. Website: https://sites.google.com/view/mappo This repository implem

null 653 Jan 6, 2023
A general-purpose multi-agent training framework.

MALib A general-purpose multi-agent training framework. Installation step1: build environment conda create -n malib python==3.7 -y conda activate mali

MARL @ SJTU 346 Jan 3, 2023