Easy-to-use micro-wrappers for Gym and PettingZoo based RL Environments

Overview

SuperSuit introduces a collection of small functions which can wrap reinforcement learning environments to do preprocessing ('microwrappers'). We support Gym for single agent environments and PettingZoo for multi-agent environments (both AECEnv and ParallelEnv environments). Using it to convert space invaders to have a grey scale observation space and stack the last 4 frames looks like:

import gym
from supersuit import color_reduction_v0, frame_stack_v1

env = gym.make('SpaceInvaders-v0')

env = frame_stack_v1(color_reduction_v0(env, 'full'), 4)

Similarly, using SuperSuit with PettingZoo environments looks like

from pettingzoo.butterfly import pistonball_v0
env = pistonball_v0.env()

env = frame_stack_v1(color_reduction_v0(env, 'full'), 4)

You can install SuperSuit via pip install supersuit

Included Functions

clip_reward_v0(env, lower_bound=-1, upper_bound=1) clips rewards to between lower_bound and upper_bound. This is a popular way of handling rewards with significant variance of magnitude, especially in Atari environments.

clip_actions_v0(env) clips Box actions to be within the high and low bounds of the action space. This is a standard transformation applied to environments with continuous action spaces to keep the action passed to the environment within the specified bounds.

color_reduction_v0(env, mode='full') simplifies color information in graphical ((x,y,3) shaped) environments. mode='full' fully greyscales of the observation. This can be computationally intensive. Arguments of 'R', 'G' or 'B' just take the corresponding R, G or B color channel from observation. This is much faster and is generally sufficient.

dtype_v0(env, dtype) recasts your observation as a certain dtype. Many graphical games return uint8 observations, while neural networks generally want float16 or float32. dtype can be anything NumPy would except as a dtype argument (e.g. np.dtype classes or strings).

flatten_v0(env) flattens observations into a 1D array.

frame_skip_v0(env, num_frames) skips num_frames number of frames by reapplying old actions over and over. Observations skipped over are ignored. Rewards skipped over are accumulated. Like Gym Atari's frameskip parameter, num_frames can also be a tuple (min_skip, max_skip), which indicates a range of possible skip lengths which are randomly chosen from (in single agent environments only).

delay_observations_v0(env, delay) Delays observation by delay frames. Before delay frames have been executed, the observation is all zeros. Along with frame_skip, this is the preferred way to implement reaction time for high FPS games.

sticky_actions_v0(env, repeat_action_probability) assigns a probability of an old action "sticking" to the environment and not updating as requested. This is to prevent agents from learning predefined action patterns in highly deterministic games like Atari. Note that the stickiness is cumulative, so an action has a repeat_action_probability^2 chance of an action sticking for two turns in a row, etc. This is the recommended way of adding randomness to Atari by "Machado et al. (2018), "Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents"

frame_stack_v1(env, num_frames=4) stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. At the start of the game, frames that don't yet exist are filled with 0s. num_frames=1 is analogous to not using this function.

max_observation_v0(env, memory) the resulting observation becomes the max over memory number of prior frames. This is important for Atari environments, as many games have elements that are intermitently flashed on the instead of being constant, due to the peculiarities of the console and CRT TVs. The OpenAI baselines MaxAndSkip Atari wrapper is equivalent to doing memory=2 and then a frame_skip of 4.

normalize_obs_v0(env, env_min=0, env_max=1) linearly scales observations to the range env_min (default 0) to env_max (default 1), given the known minimum and maximum observation values defined in the observation space. Only works on Box observations with float32 or float64 dtypes and finite bounds. If you wish to normalize another type, you can first apply the dtype wrapper to convert your type to float32 or float64.

reshape_v0(env, shape) reshapes observations into given shape.

resize_v0(env, x_size, y_size, linear_interp=False) Performs interpolation to up-size or down-size observation image using area interpolation by default. Linear interpolation is also available by setting linear_interp=True (it's faster and better for up-sizing). This wrapper is only available for 2D or 3D observations, and only makes sense if the observation is an image.

nan_noop_v0(env) If an action is a NaN value for a step, the following wrapper will trigger a warning and perform a no operation action in its place. The noop action is accepted as an argument in the step(action, no_op_action) function.

nan_zeros_v0(env) If an action is a NaN value for a step, the following wrapper will trigger a warning and perform a zeros action in its place.

nan_random_v0(env) If an action is a NaN value for a step, the following wrapper will trigger a warning and perform a random action in its place. The random action will be retrieved from the action mask.

scale_actions_v0(env, scale) Scales the high and low bounds of the action space by the scale argument in init(). Additionally, scales any actions by the same value when step() is called.

Included Multi-Agent Only Functions

agent_indicator_v0(env, type_only=False) Adds an indicator of the agent ID to the observation, only supports discrete and 1D, 2D, and 3D box. For 1d spaces, the agent ID is converted to a 1-hot vector and appended to the observation (increasing the size of the observation space as necessary). 2d and 3d spaces are treated as images (with channels last) and the ID is converted to n additional channels with the channel that represents the ID as all 1s and the other channel as all 0s (a sort of one hot encoding). This allows MADRL methods like parameter sharing to learn policies for heterogeneous agents since the policy can tell what agent it's acting on. Set the type_only parameter to parse the name of the agent as <type>_<n> and have the appended 1-hot vector only identify the type, rather than the specific agent name. This would, for example give all agents on the red team in the MAgent battle environment the same agent indicator. This is useful for games where there are many agents in an environment but few types of agents. Agent indication for MADRL was first introduced in Cooperative Multi-Agent Control Using Deep Reinforcement Learning.

black_death_v2(env) Instead of removing dead actions, observations and rewards are 0 and actions are ignored. This can simplify handling agent death mechanics. The name "black death" does not come from the plague, but from the fact that you see a black image (an image filled with zeros) when you die.

pad_action_space_v0(env) pads the action spaces of all agents to be be the same as the biggest, per the algorithm posed in Parameter Sharing is Surprisingly Useful for Deep Reinforcement Learning. This enables MARL methods that require homogeneous action spaces for all agents to work with environments with heterogeneous action spaces. Discrete actions inside the padded region will be set to zero, and Box actions will be cropped down to the original space.

pad_observations_v0(env) pads observations to be of the shape of the largest observation of any agent with 0s, per the algorithm posed in Parameter Sharing is Surprisingly Useful for Deep Reinforcement Learning. This enables MARL methods that require homogeneous observations from all agents to work in environments with heterogeneous observations. This currently supports Discrete and Box observation spaces.

Required multiagent environment attributes

Many wrappers require an environment to support the optional possible_agents attribute. These are required because the wrapper needs to know all the spaces in advance. The following is a complete list of wrappers which require these attributes:

  • black_death_v2
  • pad_action_space_v0
  • pad_observations_v0
  • agent_indicator_v0
  • pettingzoo_env_to_vec_env_v0
  • vectorize_aec_env_v0

Environment Vectorization

These functions turn plain Gym environments into vectorized environments, for every common vector environment spec.

gym_vec_env_v0(env, num_envs, multiprocessing=False) creates a Gym vector environment with num_envs copies of the environment. If multiprocessing is True, AsyncVectorEnv is used instead of SyncVectorEnv.

stable_baselines_vec_env_v0(env, num_envs, multiprocessing=False) creates a stable_baselines vector environment with num_envs copies of the environment. If multiprocessing is True, SubprocVecEnv is used instead of DummyVecEnv. Needs stable_baselines to be installed to work.

stable_baselines3_vec_env_v0(env, num_envs, multiprocessing=False) creates a stable_baselines vector environment with num_envs copies of the environment. If multiprocessing is True, SubprocVecEnv is used instead of DummyVecEnv. Needs stable_baselines3 to be installed to work.

concat_vec_envs_v1(vec_env, num_vec_envs, num_cpus=0, base_class='gym') takes in an vec_env which is vector environment (should not have multithreading enabled). Creates a new vector environment with num_vec_envs copies of that vector environment concatenated together and runs them on num_cpus cpus as balanced as possible between cpus. num_cpus=0 or num_cpus=1 means to create 0 new threads, i.e. run the process in an efficient single threaded manner. A use case for this function is given below. If the base class of the resulting vector environment matters as it does for stable baselines, you can use the base_class parameter to switch between "gym" base class and "stable_baselines3"'s base class. Note that both have identical functionality.

Parallel Environment vectorization

Note that a multi-agent environment has a similar interface to a vector environment. Give each possible agent an index in the vector and the vector of agents can be interpreted as a vector of "environments":

agent_1
agent_2
agent_3
...

Where each agent's observation, reward, done, and info will be that environment's data.

The following function performs this conversion.

pettingzoo_env_to_vec_env_v0(env): Takes a PettingZoo ParallelEnv with the following assumptions: no agent death or generation, homogeneous action and observation spaces. Returns a gym vector environment where each "environment" in the vector represents one agent. An arbitrary PettingZoo parallel environment can be enforced to have these assumptions by wrapping it with the pad_action_space, pad_observations, and the black_death wrapper). This conversion to a vector environment can be used to train appropriate pettingzoo environments with standard single agent RL methods such as stable baselines's A2C out of box (example below).

You can also use the concat_vec_envs_v1 functionality to train on several vector environments in parallel, forming a vector which looks like

env_1_agent_1
env_1_agent_2
env_1_agent_3
env_2_agent_1
env_2_agent_2
env_2_agent_3
...

So you can for example train 8 copies of pettingzoo's pistonball environment in parallel with some code like:

from stable_baselines3 import PPO
from pettingzoo.butterfly import pistonball_v4
import supersuit as ss
env = pistonball_v4.parallel_env()
env = ss.color_reduction_v0(env, mode='B')
env = ss.resize_v0(env, x_size=84, y_size=84)
env = ss.frame_stack_v1(env, 3)
env = ss.pettingzoo_env_to_vec_env_v0(env)
env = ss.concat_vec_envs_v1(env, 8, num_cpus=4, base_class='stable_baselines3')
model = PPO('CnnPolicy', env, verbose=3, n_steps=16)
model.learn(total_timesteps=2000000)

vectorize_aec_env_v0(aec_env, num_envs, num_cpus=0) creates an AEC Vector env (API documented in source here). num_cpus=0 indicates that the process will run in a single thread. Values of 1 or more will spawn at most that number of processes.

Note on multiprocessing

Turning on multiprocessing runs each environment in it's own process. Turning this on is typically much slower for fast environments (like card games), but much faster for slow environments (like robotics simulations). Determining which case you are will require testing.

On MacOS with python3.8 or higher, you will need to change the default multiprocessing setting to use fork multiprocessing instead of spawn multiprocessing, as shown below, before the multiprocessing environment is created.

import multiprocessing
multiprocessing.set_start_method("fork")

Lambda Functions

If none of the included in micro-wrappers are suitable for your needs, you can use a lambda function (or submit a PR).

action_lambda_v1(env, change_action_fn, change_space_fn) allows you to define arbitrary changes to the actions via change_action_fn(action, space) : action and to the action spaces with change_space_fn(action_space) : action_space. Remember that you are transforming the actions received by the wrapper to the actions expected by the base environment. In multi-agent environments only, the lambda functions can optionally accept an extra agent parameter, which lets you know the agent name of the action/action space, e.g. change_action_fn(action, space, agent) : action.

observation_lambda_v0(env, observation_fn, observation_space_fn) allows you to define arbitrary changes to the via observation_fn(observation, obs_space) : observation, and observation_space_fn(obs_space) : obs_space. For Box-Box transformations the space transformation will be inferred from change_observation_fn if change_obs_space_fn=None by passing the high and low bounds through the observation_space_fn. In multi-agent environments only, the lambda functions can optionally accept an agent parameter, which lets you know the agent name of the observation/observation space, e.g. observation_fn(observation, obs_space, agent) : observation.

reward_lambda_v0(env, change_reward_fn) allows you to make arbitrary changes to rewards by passing in a change_reward_fn(reward) : reward function. For Gym environments this is called every step to transform the returned reward. For AECEnv, this function is used to change each element in the rewards dictionary every step.

Lambda Function Examples

Adding noise to a Box observation looks like:

env = observation_lambda_v0(env, lambda x : x + np.random.normal(size=x.shape))

Adding noise to a box observation and increasing the high and low bounds to accommodate this extra noise looks like:

env = observation_lambda_v0(env,
    lambda x : x + np.random.normal(size=x.shape),
    lambda obs_space : gym.spaces.Box(obs_space.low-5,obs_space.high+5))

Changing 1d box action space to a Discrete space by mapping the discrete actions to one-hot vectors looks like:

def one_hot(x,n):
    v = np.zeros(n)
    v[x] = 1
    return v

env = action_lambda_v1(env,
    lambda action, act_space : one_hot(action, act_space.shape[0]),
    lambda act_space : gym.spaces.Discrete(act_space.shape[0]))

Note that many of the supersuit wrappers are implemented with a lambda wrapper behind the scenes. See here for some examples.

Citation

If you use this in your research, please cite:

@article{SuperSuit,
  Title = {SuperSuit: Simple Microwrappers for Reinforcement Learning Environments},
  Author = {Terry, Justin K and Black, Benjamin and Hari, Ananth},
  journal={arXiv preprint arXiv:2008.08932},
  year={2020}
}
Comments
  • Multiprocessing in SuperSuit

    Multiprocessing in SuperSuit

    Based on the template for multiprocessing in SB3 I decided to check if I could use multiprocessing in SuperSuit. Here are my files: petting bubble rl.zip

    from stable_baselines3.ppo import MlpPolicy
    from stable_baselines3 import PPO
    import supersuit as ss
    from petting_bubble_env_continuous import PettingBubblesEnvironment
    import numpy as np
    import time
    import os
    
    print("{} cpus available".format(os.cpu_count()))
    args = [3, 3, 5, 20]
    n_timesteps = int(8e3)
    
    #single process
    env_single = PettingBubblesEnvironment(*args)
    env_single = ss.black_death_v1(env_single)
    env_single = ss.pettingzoo_env_to_vec_env_v0(env_single)
    env_single = ss.concat_vec_envs_v0(env_single, 8, num_cpus=1, base_class='stable_baselines3')
    model = PPO(MlpPolicy, env_single, verbose=0, gamma=0.995, ent_coef=0.01, learning_rate=2.5e-5, vf_coef=0.5,
                max_grad_norm=0.5, gae_lambda=0.95, n_epochs=4, clip_range=0.2, clip_range_vf=1)
    
    start_time = time.time()
    model.learn(total_timesteps=n_timesteps)
    total_time_single = time.time()-start_time
    print(f"Took {total_time_single:.2f}s for single process version - {n_timesteps / total_time_single:.2f} FPS")
    
    #multiprocessing
    env_multi = PettingBubblesEnvironment(*args)
    env_multi = ss.black_death_v1(env_multi)
    env_multi = ss.pettingzoo_env_to_vec_env_v0(env_multi)
    env_multi = ss.concat_vec_envs_v0(env_multi, 8, num_cpus=8, base_class='stable_baselines3')
    model = PPO(MlpPolicy, env_multi, verbose=0, gamma=0.995, ent_coef=0.01, learning_rate=2.5e-5, vf_coef=0.5,
                max_grad_norm=0.5, gae_lambda=0.95, n_epochs=4, clip_range=0.2, clip_range_vf=1)
    
    start_time = time.time()
    model.learn(total_timesteps=n_timesteps)
    total_time_multi = time.time()-start_time
    print(f"Took {total_time_multi:.2f}s for multiprocessed version - {n_timesteps / total_time_multi:.2f} FPS")
    

    However, the version with multiprocessing fails...

    pygame 2.0.1 (SDL 2.0.14, Python 3.8.8)
    Hello from the pygame community. https://www.pygame.org/contribute.html
    16 cpus available
    Took 171.35s for single process version - 46.69 FPS
    Traceback (most recent call last):
      File "C:/Users/pedro/OneDrive/Documentos/2021 Learning Matters/petting bubble rl/petting_bubble_multi_test.py", line 30, in <module>
        env_multi = ss.concat_vec_envs_v0(env_multi, 8, num_cpus=8, base_class='stable_baselines3')
      File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector_constructors.py", line 37, in concat_vec_envs
        vec_env = MakeCPUAsyncConstructor(num_cpus)(*vec_env_args(vec_env, num_vec_envs))
      File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector\constructors.py", line 38, in constructor
        return ProcConcatVec(cat_env_fns, obs_space, act_space, num_fns * envs_per_env, example_env.metadata)
      File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector\multiproc_vec.py", line 83, in __init__
        proc.start()
      File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\process.py", line 121, in start
        self._popen = self._Popen(self)
      File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\context.py", line 224, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\context.py", line 327, in _Popen
        return Popen(process_obj)
      File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
        reduction.dump(process_obj, to_child)
      File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\reduction.py", line 60, in dump
        ForkingPickler(file, protocol).dump(obj)
    AttributeError: Can't pickle local object 'vec_env_args.<locals>.env_fn'
    Exception ignored in: <function ProcConcatVec.__del__ at 0x000001C30FEBF700>
    Traceback (most recent call last):
      File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector\multiproc_vec.py", line 147, in __del__
        for pipe in self.pipes:
    AttributeError: 'ProcConcatVec' object has no attribute 'pipes'
    
    Process finished with exit code 1
    
    opened by p-veloso 40
  • Frame stack

    Frame stack

    frame_stack.py modifying (same as stablebaseline3) Original: fill observation stack by zero observations Modify: fill observation stack by the copy of the first observation

    aec_mock_test modifying (Reflecting modified frame stack)

    opened by jackyoung96 19
  • AttributeError: 'SharedArray' object has no attribute 'dtype'

    AttributeError: 'SharedArray' object has no attribute 'dtype'

    when I try the tutorial of pistonball, I met this attribute error. But I have no idea that what should do. So what should I do to deal with the error?

    opened by SOMEAIDI 14
  • Truncation API Update

    Truncation API Update

    About

    As part of the Gym update to 0.25, the following changes have been made:

    done -> termination and truncation

    The singular done signal has been changed to a termination and truncation signal, where termination dictates that the environment has ended due to meeting certain conditions, and truncation dictates that the environment has ended due to exceeding the time/frame limit

    Progress

    The road plan for this update is as follows:

    1. Truncation/Termination update
      • [ ] AEC Vector
      • [ ] Generic Wrappers
      • [x] Lambda Wrappers
      • [ ] Multiagent Wrappers
      • [ ] Utils
      • [ ] Vector
    2. Tests update
      • [x] vec_env_test.py
      • [x] test_autodep.py
      • [ ] pettingzoo_api_test.py
      • [x] parallel_env_test.py
      • [ ] gym_unwrapped_test.py
      • [ ] gym_mock_test.py
      • [ ] generated_agents_test.py
      • [x] aec_unwrapped_test.py
      • [x] aec_mock_test.py
      • [ ] test_vector/
      • [ ] test_utils/
    opened by jjshoots 11
  • Fix for a) Seeding problems from gym env_checker b) infos API change

    Fix for a) Seeding problems from gym env_checker b) infos API change

    There are two issues that the new gym release has brought to Supersuit.

    ~~a) Seeding issues~~

    ~~Because of the new env_checker in gym, when env_checker is enabled on gym.make() (which is done by default), the environment gets unintentionally seeded.~~ ~~As a result, any vec envs derived from single envs end up having the same seed even when the seed given to them was None.~~ ~~The current fix just reseeds any vec envs that are derived from single envs.~~

    ~~Although this works, I don't quite like this solution as it seems very hacky(?), though I may be educated otherwise. I think the better solution is for env_checker to reset the env's seed instead of having it be done outside, which is probably the expected behaviour of env_checker (checking without modifying).~~

    b) Infos API change

    Because of the recent infos API change on gym 0.24.0, the infos style for supersuit vec env wrappers are now no longer coherent to the infos style of gym vec envs.

    If I understand things correctly, there are two wrappers that need to be fixed:

    ConcatVecEnv

    I've implemented a simple hacky fix for this, though I think my hacky fix is still pretty suboptimal and could take some ideas from people.

    ProcConcatVecEnv

    I've not touched this wrapper yet mostly because I don't really know what's going on with it yet.

    opened by jjshoots 11
  • Possible problem related to `pettingzoo_env_to_vec_env_v1` and `reset()` function

    Possible problem related to `pettingzoo_env_to_vec_env_v1` and `reset()` function

    I am having trouble in making things work with a Custom ParallelEnv I wrote by using PettingZoo. I am using SuperSuit's ss.pettingzoo_env_to_vec_env_v1(env) as a wrapper to Vectorize the environment and make it work with Stable-Baseline3 as documented here.

    You can find attached a summary of the most relevant part of the code:

    from typing import Optional
    from gym import spaces
    import numpy as np
    from pettingzoo import ParallelEnv
    import supersuit as ss
    
    
    def env(**kwargs):
        env_ = parallel_env(**kwargs)
        env_ = ss.pettingzoo_env_to_vec_env_v1(env_)
        return env_
    
    
    petting_zoo = env
    
    
    class parallel_env(ParallelEnv):
        metadata = {'render_modes': ['ansi'], "name": "PlayerEnv-v0"}
    
        def __init__(self, n_agents, new_step_api: bool = True) -> None:
            # [...]
            self.possible_agents = [
                f"player_{idx}" for idx in range(n_agents)]
    
            self.agents = self.possible_agents[:]
    
            self.agent_name_mapping = dict(
                zip(self.possible_agents, list(range(len(self.possible_agents))))
            )
    
            self.observation_spaces = spaces.Dict(
                {agent: spaces.Box(shape=(20,), dtype=np.float64, low=0.0, high=1.0)
                 for agent in self.possible_agents}
            )
    
            self.action_spaces = spaces.Dict(
                {agent: spaces.Box(low=0, high=4, shape=(1,), dtype=np.int32)
                 for agent in self.possible_agents}
            )
    
        def observation_space(self, agent):
            return self.observation_spaces[agent]
    
        def action_space(self, agent):
            return self.action_spaces[agent]
    
        def __calculate_observation(self, idx_player: int) -> np.ndarray:
            # Calculate the observation for the given player (just an example)
            observation = np.zeros(20)
            return observation
    
        def __calculate_observations(self) -> np.ndarray:
            """
            This method returns the observations for all players.
            """
    
            observations = {
                agent: self.__calculate_observation(
                    idx_player=self.agent_name_mapping[agent])
                for agent in self.agents
            }
            return observations
    
        def observe(self, agent):
            i = self.agent_name_mapping[agent]
            return self.__calculate_observation(idx_player=i)
    
        def step(self, actions):
            observations = self.__calculate_observations()
            rewards = self.__calculate_rewards()  # As example
            self._episode_ended = self.__check_episode_ended()  # As example
    
            if self._episode_ended:
                infos = {agent: {} for agent in self.agents}
                dones = {agent: self._episode_ended for agent in self.agents}
                rewards = {
                    self.agents[i]: rewards[i]
                    for i in range(len(self.agents))
                }
                self.agents = {}  # To satisfy `set(par_env.agents) == live_agents`
    
            else:
                infos = {agent: {"discount": 1.0} for agent in self.agents}
                dones = {agent: self._episode_ended for agent in self.agents}
                rewards = {
                    self.agents[i]: rewards[i]
                    for i in range(len(self.agents))
                }
    
            return observations, rewards, dones, infos
    
        def reset(self,
                  seed: Optional[int] = None,
                  return_info: bool = False,
                  options: Optional[dict] = None,):
            # Reset the environment and get observation from each player.
            observations = self.__calculate_observations()
            return observations
    

    Unfortunately when I try to test with the following main procedure:

    import gym
    from player_env import player_env
    from stable_baselines3.common.env_checker import check_env
    from pettingzoo.test import parallel_api_test
    
    
    if __name__ == '__main__':
        # Environment initialization
        env = player_env.petting_zoo(agents=10)
        parallel_api_test(env)  # Works
        check_env(env)  # Throws error
    

    I get the following error:

    AssertionError: The observation returned by the `reset()` method does not match the given observation space
    

    It seems like that ss.pettingzoo_env_to_vec_env_v1(env) is capable of splitting the parallel environment in multiple vectorized ones, but not for the reset() function.

    Does anyone know how to fix this problem?

    opened by PieroMacaluso 10
  • Supersuit refactor

    Supersuit refactor

    • Did not change tests
    • Most wrappers are now simply lambda wrappers. Hopefully this provides a nice example for people who want to create custom wrappers. These are located in the basic_wrappers.py file
    • For some wrappers, the lambda wrapper API is insufficient because the state of the wrapper needs to store agent specific information and change on resets. For these environments, I created a modifier class which is instantiated for each agent. This modifier class had reset(), modify_observation(), etc, and is used by a shared_wrapper which simply uses this modifier class to implement the logic of the wrapper. These are located in the more_wrappers.py file.

    @justinkterry Please check that this meets your needs before merging.

    opened by benblack769 8
  • Support vectorized video rendering

    Support vectorized video rendering

    Hello, this PR enables the video recording through VecVideoRecorder from sb3. Below is an example code:

    from stable_baselines3 import PPO
    from pettingzoo.butterfly import pistonball_v4
    from stable_baselines3.common.vec_env import VecVideoRecorder
    import supersuit as ss
    import time
    env = pistonball_v4.parallel_env()
    env = ss.color_reduction_v0(env, mode='B')
    env = ss.resize_v0(env, x_size=84, y_size=84)
    env = ss.frame_stack_v1(env, 3)
    env = ss.pettingzoo_env_to_vec_env_v0(env)
    env = ss.concat_vec_envs_v0(env, 8, num_cpus=1, base_class='stable_baselines3')
    env = VecVideoRecorder(
        env, f'videos/{time.time()}',
        record_video_trigger=lambda x: x % 1000 == 0, video_length=20)
    model = PPO('CnnPolicy', env, verbose=3, n_steps=16, device="cpu")
    model.learn(total_timesteps=2000000)
    

    It will be great to incorporate this code. Being selfish for a second, this change will help me record the videos of agents playing the game throughout training, much like my experiments with the procgen env. https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Procgen-New--Vmlldzo0NDUyMTg

    d

    opened by vwxyzjn 8
  • Preprocessing order for graphical environments

    Preprocessing order for graphical environments

    Hi! More of a usage/best practice question than pkg issue. I am using Knight Archer Zombies as an env to train MARL algorithms in RLlib. because KAZ is a graphical env, I have to preprocess it to work with RLlib's default conv filter size (512x512x3 -> 84x84xK).

    I am using the following wrappers:

    • color reduction
    • frame stack
    • resize
    • dtype
    • normalize

    Is there a heuristic for determining the order in which these should be applied? Of which resize is

    opened by yutaizhou 8
  • NotImplementedError: Calling seed externally is deprecated; call reset(seed=seed) instead

    NotImplementedError: Calling seed externally is deprecated; call reset(seed=seed) instead

    Hi,

    When I run the below sample code from the README.md file, I get the following error: NotImplementedError: Calling seed externally is deprecated; call reset(seed=seed) instead

    This is the code sample that I run:

    from stable_baselines3 import PPO
    from pettingzoo.butterfly import pistonball_v6
    import supersuit as ss
    env = pistonball_v6.parallel_env()
    env = ss.color_reduction_v0(env, mode='B')
    env = ss.resize_v0(env, x_size=84, y_size=84)
    env = ss.frame_stack_v1(env, 3)
    env = ss.pettingzoo_env_to_vec_env_v1(env)
    env = ss.concat_vec_envs_v1(env, 8, num_cpus=4, base_class='stable_baselines3')
    model = PPO('CnnPolicy', env, verbose=3, n_steps=16)
    model.learn(total_timesteps=2000000)
    

    I am using this with supersuit==3.3.3 and pettingzoo==1.8.1

    I should also note that I am using gym==0.21.0 (this is the latest version stable-baselines3 is compatible with)

    I should also note that there is a similar issue here however that was inactive and had been closed without being resolved.

    Thanks

    Here is the full traceback:

    Traceback (most recent call last):
      File "C:/Users/timf3/PycharmProjects/SSD2/run_scripts/testing_supersuit.py", line 11, in <module>
        env = ss.concat_vec_envs_v1(env, 8, num_cpus=4, base_class='stable_baselines3')
      File "C:\Users\timf3\PycharmProjects\SSD2\venv\lib\site-packages\supersuit\vector\vector_constructors.py", line 49, in concat_vec_envs_v1
        vec_env = MakeCPUAsyncConstructor(num_cpus)(*vec_env_args(vec_env, num_vec_envs))
      File "C:\Users\timf3\PycharmProjects\SSD2\venv\lib\site-packages\supersuit\vector\constructors.py", line 20, in constructor
        example_env = env_fn_list[0]()
      File "C:\Users\timf3\PycharmProjects\SSD2\venv\lib\site-packages\supersuit\vector\vector_constructors.py", line 11, in env_fn
        env.seed(None)
      File "C:\Users\timf3\PycharmProjects\SSD2\venv\lib\site-packages\supersuit\vector\markov_vector_wrapper.py", line 32, in seed
        self.par_env.seed(seed)
      File "C:\Users\timf3\PycharmProjects\SSD2\venv\lib\site-packages\supersuit\generic_wrappers\utils\shared_wrapper_util.py", line 89, in seed
        super().seed(seed)
      File "C:\Users\timf3\PycharmProjects\SSD2\venv\lib\site-packages\pettingzoo\utils\env.py", line 251, in seed
        "Calling seed externally is deprecated; call reset(seed=seed) instead"
    NotImplementedError: Calling seed externally is deprecated; call reset(seed=seed) instead
    opened by timf34 7
  • WIP: Move seed to reset()

    WIP: Move seed to reset()

    This is a WIP in support of a related PR on PettingZoo. I expect this to fail tests since neither PettingZoo nor Gym has been updated to have the seed parameter on the reset function.

    opened by elbamos 7
  • Fix typo: `BaseParallelWraper` renamed to `BaseParallelWrapper`

    Fix typo: `BaseParallelWraper` renamed to `BaseParallelWrapper`

    Description

    As title, fix typo of BaseParallelWrapper (previously BaseParallelWraper). Linked to its PettingZoo counterpart https://github.com/Farama-Foundation/PettingZoo/pull/876.

    opened by mikcnt 0
  • Accessing an env attribute from wrappers

    Accessing an env attribute from wrappers

    Hi all, I have a custom PettingZoo ParallelEnv, that is then converted to a SB3VecEnvWrapper through

    vec_env = ss.pettingzoo_env_to_vec_env_v1(env)
    vec_env = ss.concat_vec_envs_v1(vec_env, num_envs, num_cpus = 4, base_class = "stable_baselines3")
    

    Now, I need to access an attribute of env, and I am trying with vec_env.get_attr(name) (that SB3VecEnvWrapper should inherit from stable_baselines3's VecEnvWrapper), but I get an error:

    AttributeError: 'ConcatVecEnv' object has no attribute 'get_attr'

    ConcatVecEnv is inheriting from gym VectorEnv, that indeed has a get_attr(name) method. So, what am I doing wrong here?

    My setting the following:

    • PettingZoo==1.17.0
    • SuperSuit==3.3.3
    • stable_baselines3==1.6.0
    • gym==0.21.0
    opened by opocaj92 0
  • [Termination Truncation Update] Refactor multiproc code

    [Termination Truncation Update] Refactor multiproc code

    Summary

    My debugging skills and I have reached an impasse, and I believe this debug difficulty is pretty much the same thing that caused Jat some grief.

    When running tests, there is a failure in env.reset() somewhere. In Jat's PR, https://github.com/Farama-Foundation/SuperSuit/pull/165, Ben suggests the issue is related to:

    To get reset() to return info, you will also need to extract the infos returned here https://github.com/Farama-Foundation/SuperSuit/blob/master/supersuit/vector/multiproc_vec.py#L179

    The above line referenced is the same line that's raising errors. It's a shape mismatch (expected less params than a func returns), but that's all I can infer from the traceback and Pycharm debugger.

    Test tracebacks

    1

    test/test_vector/test_gym_vector.py:54 (test_gym_supersuit_equivalency)
    def test_gym_supersuit_equivalency():
            env = gym.make("MountainCarContinuous-v0")
            num_envs = 3
            venv1 = concat_vec_envs_v1(env, num_envs)
            venv2 = gym_vec_env_v0(env, num_envs)
    >       check_vec_env_equivalency(venv1, venv2)
    
    test/test_vector/test_gym_vector.py:60: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    test/test_vector/test_gym_vector.py:38: in check_vec_env_equivalency
        obs1 = venv1.reset(seed=51)
    supersuit/vector/concat_vec_env.py:46: in reset
        return self.concat_obs(_res_obs)
    supersuit/vector/concat_vec_env.py:68: in concat_obs
        return concatenate(
    /usr/lib/python3.9/functools.py:888: in wrapper
        return dispatch(args[0].__class__)(*args, **kw)
    venv/lib/python3.9/site-packages/gym/vector/utils/numpy_utils.py:50: in _concatenate_base
        return np.stack(items, axis=0, out=out)
    <__array_function__ internals>:180: in stack
        ???
    venv/lib/python3.9/site-packages/numpy/core/shape_base.py:433: in stack
        return _nx.concatenate(expanded_arrays, axis=axis, out=out)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    args = ([array([[array([-0.41151175,  0.        ], dtype=float32), {}]],
          dtype=object), array([[array([-0.4768474,  0. ... ], dtype=float32), {}]], dtype=object), array([[array([-0.5977746,  0.       ], dtype=float32), {}]], dtype=object)],)
    kwargs = {'axis': 0, 'out': array([[0., 0.],
           [0., 0.],
           [0., 0.]], dtype=float32)}
    relevant_args = [array([[array([-0.41151175,  0.        ], dtype=float32), {}]],
          dtype=object), array([[array([-0.4768474,  0.  ...,  0.       ], dtype=float32), {}]], dtype=object), array([[0., 0.],
           [0., 0.],
           [0., 0.]], dtype=float32)]
    
    >   ???
    E   TypeError: Cannot cast array data from dtype('O') to dtype('float32') according to the rule 'same_kind'
    
    <__array_function__ internals>:180: TypeError
    

    2

    test/test_vector/test_gym_vector.py:62 (test_inital_state_dissimilarity)
    def test_inital_state_dissimilarity():
            env = gym.make("CartPole-v1")
            venv = concat_vec_envs_v1(env, 2)
    >       observations = venv.reset()
    
    test/test_vector/test_gym_vector.py:66: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    supersuit/vector/concat_vec_env.py:46: in reset
        return self.concat_obs(_res_obs)
    supersuit/vector/concat_vec_env.py:68: in concat_obs
        return concatenate(
    /usr/lib/python3.9/functools.py:888: in wrapper
        return dispatch(args[0].__class__)(*args, **kw)
    venv/lib/python3.9/site-packages/gym/vector/utils/numpy_utils.py:50: in _concatenate_base
        return np.stack(items, axis=0, out=out)
    <__array_function__ internals>:180: in stack
        ???
    venv/lib/python3.9/site-packages/numpy/core/shape_base.py:433: in stack
        return _nx.concatenate(expanded_arrays, axis=axis, out=out)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    args = ([array([[array([ 0.01767829, -0.04421717, -0.04463257,  0.01125965], dtype=float32),
            {}]], dtype=object), array([[array([ 0.04047002,  0.04853883,  0.01476109, -0.02565673], dtype=float32),
            {}]], dtype=object)],)
    kwargs = {'axis': 0, 'out': array([[0., 0., 0., 0.],
           [0., 0., 0., 0.]], dtype=float32)}
    relevant_args = [array([[array([ 0.01767829, -0.04421717, -0.04463257,  0.01125965], dtype=float32),
            {}]], dtype=object), arra...65673], dtype=float32),
            {}]], dtype=object), array([[0., 0., 0., 0.],
           [0., 0., 0., 0.]], dtype=float32)]
    
    >   ???
    E   ValueError: Output array is the wrong shape
    
    <__array_function__ internals>:180: ValueError
    

    3

    test/test_vector/test_gym_vector.py:78 (test_mutliproc_single_proc_equivalency)
    def test_mutliproc_single_proc_equivalency():
            env = gym.make("CartPole-v1")
            num_envs = 3
            # uses single threaded vector environment
            venv1 = concat_vec_envs_v1(env, num_envs, num_cpus=0)
            # uses multiprocessing vector environment
            venv2 = concat_vec_envs_v1(env, num_envs, num_cpus=4)
    >       check_vec_env_equivalency(venv1, venv2)
    
    test/test_vector/test_gym_vector.py:86: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    test/test_vector/test_gym_vector.py:38: in check_vec_env_equivalency
        obs1 = venv1.reset(seed=51)
    supersuit/vector/concat_vec_env.py:46: in reset
        return self.concat_obs(_res_obs)
    supersuit/vector/concat_vec_env.py:68: in concat_obs
        return concatenate(
    /usr/lib/python3.9/functools.py:888: in wrapper
        return dispatch(args[0].__class__)(*args, **kw)
    venv/lib/python3.9/site-packages/gym/vector/utils/numpy_utils.py:50: in _concatenate_base
        return np.stack(items, axis=0, out=out)
    <__array_function__ internals>:180: in stack
        ???
    venv/lib/python3.9/site-packages/numpy/core/shape_base.py:433: in stack
        return _nx.concatenate(expanded_arrays, axis=axis, out=out)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    args = ([array([[array([ 0.04424412, -0.01928616, -0.02450934, -0.02773815], dtype=float32),
            {}]], dtype=object), arr...ct), array([[array([-0.0488873 ,  0.02189106, -0.01688728,  0.04330887], dtype=float32),
            {}]], dtype=object)],)
    kwargs = {'axis': 0, 'out': array([[0., 0., 0., 0.],
           [0., 0., 0., 0.],
           [0., 0., 0., 0.]], dtype=float32)}
    relevant_args = [array([[array([ 0.04424412, -0.01928616, -0.02450934, -0.02773815], dtype=float32),
            {}]], dtype=object), arra...       {}]], dtype=object), array([[0., 0., 0., 0.],
           [0., 0., 0., 0.],
           [0., 0., 0., 0.]], dtype=float32)]
    
    >   ???
    E   ValueError: Output array is the wrong shape
    
    <__array_function__ internals>:180: ValueError
    

    4

      File "/home/will/PycharmProjects/SuperSuit/supersuit/vector/multiproc_vec.py", line 73, in async_loop
        observations = vec_env.reset(seed=data[0], options=data[2])
      File "/home/will/PycharmProjects/SuperSuit/supersuit/vector/concat_vec_env.py", line 46, in reset
        return self.concat_obs(_res_obs)
      File "/home/will/PycharmProjects/SuperSuit/supersuit/vector/concat_vec_env.py", line 68, in concat_obs
        return concatenate(
      File "/usr/lib/python3.9/functools.py", line 888, in wrapper
        return dispatch(args[0].__class__)(*args, **kw)
      File "/home/will/PycharmProjects/SuperSuit/venv/lib/python3.9/site-packages/gym/vector/utils/numpy_utils.py", line 50, in _concatenate_base
        return np.stack(items, axis=0, out=out)
      File "<__array_function__ internals>", line 180, in stack
      File "/home/will/PycharmProjects/SuperSuit/venv/lib/python3.9/site-packages/numpy/core/shape_base.py", line 433, in stack
        return _nx.concatenate(expanded_arrays, axis=axis, out=out)
      File "<__array_function__ internals>", line 180, in concatenate
    ValueError: Output array is the wrong shape
    
    
    test/test_vector/test_gym_vector.py:99 (test_multiproc_buffer)
    def test_multiproc_buffer():
            num_envs = 2
            env = gym.make("CartPole-v1")
            env = concat_vec_envs_v1(env, num_envs, num_cpus=2)
        
    >       obss = env.reset()
    
    test/test_vector/test_gym_vector.py:105: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    supersuit/vector/multiproc_vec.py:183: in reset
        self._receive_info()
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    self = ProcConcatVec(2)
    
        def _receive_info(self):
            all_data = []
            for cin in self.pipes:
                data = cin.recv()
                if isinstance(data, tuple):
                    e, tb = data
                    print(tb)
    >               raise e
    E               ValueError: Output array is the wrong shape
    
    supersuit/vector/multiproc_vec.py:200: ValueError
    

    Reproduction

    Make sure you pip install PZ's master branch (pip install git+...;'), it is insetup.py`

    Run pytest to see the failing tests.

    opened by WillDudley 0
  • `concat_vec_envs_v1` prevents Python scripts from exiting

    `concat_vec_envs_v1` prevents Python scripts from exiting

    Hi,

    When I use concat_vec_envs_v1 when initializing my environment, the Python script will not exit.

    Even when I call env.close() at the end of the script, and use sys.exit() or similar, it still will not exit.

    Is there anything in particular I need to call to exit the script? Is this known to be an issue?

    Thanks Tim

    opened by timf34 3
  • Tricky thing with `agent_indicator_v0` in ALE

    Tricky thing with `agent_indicator_v0` in ALE

    Currently when applying agent_indicator_v0 with Atari games, agent_indicator_v0 adds two additional channels of [0, 1] for player 1 and [1, 0] for player 2, which looks like below:

    image

        env = pong_v2.parallel_env()
        env = ss.color_reduction_v0(env, mode="B")
        env = ss.resize_v0(env, x_size=84, y_size=84)
        # env = ss.frame_stack_v1(env, 4)
        env = ss.agent_indicator_v0(env, type_only=False)
        env = ss.pettingzoo_env_to_vec_env_v1(env)
        envs = ss.concat_vec_envs_v1(env, args.num_envs // 2, num_cpus=0, base_class="gym")
    

    This may cause a problem to the learning algorithm because the learning algorithm usually divide the channels by 255, so the agent indicator channels would have looked like [0, 0.0039] and [0.0039, 0] , which is may be not differentiable to the agent. Instead agent_indicator_v0 should have used the maximum value of the observation space like [255, 0] and [0, 255] as the agent indicator.

    opened by vwxyzjn 3
  • Is it possible to convert an AEC Environment with an Action_Mask to a Vector_Env in SuperSuit? (So it works well with Stable-Baselines3)

    Is it possible to convert an AEC Environment with an Action_Mask to a Vector_Env in SuperSuit? (So it works well with Stable-Baselines3)

    So, I have a custom AEC Environment written in PettingZoo that has an action_mask (it's a board game with a large amount of "legal moves"; so it's necessary to mask out illegal moves during training), but when I try to vectorize this aec_env I get the following error:

    ~\AppData\Local\Temp/ipykernel_15820/1366347090.py in <module>
          4 import supersuit as ss
          5 
    ----> 6 env = ss.vectorize_aec_env_v0(env, num_envs = 8, num_cpus = 4)
          7
    ~\AppData\Roaming\Python\Python37\site-packages\supersuit\aec_vector\create.py in vectorize_aec_env_v0(aec_env, num_envs, num_cpus)
         13         return SyncAECVectorEnv(env_list)
         14     else:
    ---> 15         return AsyncAECVectorEnv(env_list, num_cpus)
    
    ~\AppData\Roaming\Python\Python37\site-packages\supersuit\aec_vector\async_vector_env.py in __init__(self, env_constructors, num_cpus, return_copy)
        254                 SpaceWrapper(self.action_spaces[agent]),
        255             )
    --> 256             for agent in self.possible_agents
        257         }
        258 
    
    ~\AppData\Roaming\Python\Python37\site-packages\supersuit\aec_vector\async_vector_env.py in <dictcomp>(.0)
        254                 SpaceWrapper(self.action_spaces[agent]),
        255             )
    --> 256             for agent in self.possible_agents
        257         }
        258 
    
    ~\AppData\Roaming\Python\Python37\site-packages\supersuit\aec_vector\async_vector_env.py in __init__(self, space)
         22             self.low = space.low
         23         else:
    ---> 24             assert False, "ProcVectorEnv only support Box and Discrete types"
         25 
         26 
    
    AssertionError: ProcVectorEnv only support Box and Discrete types
    

    For reference, the action_space and the observation_space (which I suspect is the cause of this error due to the prescence of the action_mask and the fact that it is a Dict instead of mapping directly to a gym.Space) of my environment are as follow:

    self.action_spaces = {name: spaces.Discrete(4672) for name in self.possible_agents} self.observation_spaces = {name: spaces.Dict({ 'observation': spaces.Box(low=-1, high=500, shape=(5, 8, 8, 14), dtype=np.int16), 'action_mask': spaces.Box(low=0, high=1, shape=(4672,), dtype=np.int8) }) for name in self.possible_agents}

    Is there a way to convert my AEC Environment with an action_mask to a vector_env so I can use it to train with stable-baselines3 (specifically with PPO with a CNN)? If not, could you offer me some alternatives to train my environment with (I am basically a beginner in RL and I would love some advice on some libraries that have easy trainers with common RL algorithms like PPO with native CNN support).

    Thanks for your help.

    enhancement 
    opened by akshaygh0sh 3
Releases(3.7.1)
  • 3.7.1(Jan 2, 2023)

    What's Changed

    [experimental] Python 3.11 support

    • Update setup.py, add license and try to update readme by @jjshoots in https://github.com/Farama-Foundation/SuperSuit/pull/197
    • Change README to stub by @WillDudley in https://github.com/Farama-Foundation/SuperSuit/pull/198
    • fix: np.equal in test by @younik in https://github.com/Farama-Foundation/SuperSuit/pull/202

    Full Changelog: https://github.com/Farama-Foundation/SuperSuit/compare/3.7.0...3.7.1

    Source code(tar.gz)
    Source code(zip)
  • 3.7.0(Oct 7, 2022)

    What's Changed

    • Update render api by @younik in https://github.com/Farama-Foundation/SuperSuit/pull/190
    • fix flake 8 by @jjshoots in https://github.com/Farama-Foundation/SuperSuit/pull/191
    • fix more flake 8 by @jjshoots in https://github.com/Farama-Foundation/SuperSuit/pull/192
    • remove sanity checks (see WillDudley/SuperSuitSanityChecks) by @WillDudley in https://github.com/Farama-Foundation/SuperSuit/pull/194
    • Update security permissions for GitHub workflows by @andrewtanJS in https://github.com/Farama-Foundation/SuperSuit/pull/193
    • version bump by @WillDudley in https://github.com/Farama-Foundation/SuperSuit/pull/195
    • bump ss dep by @WillDudley in https://github.com/Farama-Foundation/SuperSuit/pull/196

    New Contributors

    • @younik made their first contribution in https://github.com/Farama-Foundation/SuperSuit/pull/190
    • @andrewtanJS made their first contribution in https://github.com/Farama-Foundation/SuperSuit/pull/193

    Full Changelog: https://github.com/Farama-Foundation/SuperSuit/compare/3.6.0...3.7.0

    Source code(tar.gz)
    Source code(zip)
  • 3.6.0(Sep 24, 2022)

    What's Changed

    1. As part of the Gym update to 0.26, the following change has been made:
      • done -> termination and truncation: The singular done signal has been changed to a termination and truncation signal, where termination dictates that the environment has ended due to meeting certain conditions, and truncation dictates that the environment has ended due to exceeding the time/frame limit.
    2. Deprecated ConcatVecEnv and ProcConcatVecEnv because they were buggy.
    3. General bug fixes.

    List of Changes

    • Fix out of date assertation message by @jjshoots in https://github.com/Farama-Foundation/SuperSuit/pull/174
    • Fix typecast bug from #175 by @jjshoots in https://github.com/Farama-Foundation/SuperSuit/pull/179
    • Master to API_Update by @jjshoots in https://github.com/Farama-Foundation/SuperSuit/pull/180
    • Update Lambda wrappers and fix a bunch of the tests by @jjshoots in https://github.com/Farama-Foundation/SuperSuit/pull/178
    • vectorize_aec_env_v0() trunc api update by @WillDudley in https://github.com/Farama-Foundation/SuperSuit/pull/182
    • Remove missing envs by @jjshoots in https://github.com/Farama-Foundation/SuperSuit/pull/185
    • Updating Generic Wrappers to the new Gym API by @reginald-mclean in https://github.com/Farama-Foundation/SuperSuit/pull/184
    • aec_mock_test fix by @jjshoots in https://github.com/Farama-Foundation/SuperSuit/pull/186
    • Multiagent wrappers, tests, vecenv, aecenv fixes & more by @WillDudley in https://github.com/Farama-Foundation/SuperSuit/pull/183
    • Truncation API Update by @jjshoots in https://github.com/Farama-Foundation/SuperSuit/pull/181

    New Contributors

    • @WillDudley made their first contribution in https://github.com/Farama-Foundation/SuperSuit/pull/182
    • @reginald-mclean made their first contribution in https://github.com/Farama-Foundation/SuperSuit/pull/184

    Full Changelog: https://github.com/Farama-Foundation/SuperSuit/compare/3.5.0...3.6.0

    Source code(tar.gz)
    Source code(zip)
  • 3.5.0(Jun 21, 2022)

  • 3.4.0(Apr 29, 2022)

    • Support changes to PettingZoo seed API
    • Switch to tinyscalar instead of opencv for image resizing, bump versions of all applicable wrappers
    Source code(tar.gz)
    Source code(zip)
  • 3.3.4(Mar 5, 2022)

  • 3.3.3(Jan 28, 2022)

  • 3.3.2(Nov 28, 2021)

  • 3.3.1(Nov 5, 2021)

  • 3.3.0(Oct 19, 2021)

    • Fixed bugs in observation space manipulation introduced in 3.2.0
    • Removed all references to action_spaces and observation_spaces, improving support for pettingzoo 1.12.0 observation and action space functions
    • Dropped python 3.6 support, added official 3.9 support
    Source code(tar.gz)
    Source code(zip)
  • 3.2.0(Oct 8, 2021)

    • Added support for pettingzoo==1.12.0 and gym==0.21.0
    • Adds support for generated agents when possible
    • Removed supersuit 3.0 warning
    • Added warnings inf bounds on norm_obs wrapper
    • Uses pettingzoo's new BaseParallelWrapper utility instead of implementing its own
    Source code(tar.gz)
    Source code(zip)
  • 3.1.0(Aug 19, 2021)

    • Fixes long standing issue with black death cumulative rewards, bumps version to v2
    • Fixes v3 regressions in pad_observations, pad_actions, and crashing issue with generic wrapper, does not bump versions
    • Cleans up dependencies
    Source code(tar.gz)
    Source code(zip)
  • 3.0.1(Jul 8, 2021)

  • 3.0.0(Jul 7, 2021)

    • Full refactor of ordinary wrappers
      • Most supersuit wrappers now have a single, generic implementation that works for gym and pettingzoo envs. This makes adding new wrappers much easier.
      • File structure is reorganized.
    • Added scale_actions_v0 wrapper
    • Added nan_noop, nan_zeros, and nan_random wrappers to handle nan actions in a reasonable way. Important to deal with RL frameworks that occasionally give out nan values.
    • cv2 package is only imported when needed
    Source code(tar.gz)
    Source code(zip)
  • 2.6.6(Jun 12, 2021)

  • 2.6.5(May 26, 2021)

    • Patched issue where environment random state would be the same in vector environments. Before this could be fixed by calling seed() on the vector environment, now this happens by default.
    Source code(tar.gz)
    Source code(zip)
  • 2.6.4(Apr 27, 2021)

  • 2.6.3(Apr 20, 2021)

    • Fixed automatic space resizing for observation lambda wrapper
    • Added warning for giving vector env constructors an incorrect type
    • Added support for stable baselines env_is_wrapped method
    Source code(tar.gz)
    Source code(zip)
  • 2.6.2(Mar 21, 2021)

    • Fixed bug in gym action lambda wrapper
    • Lambda wrappers can have an optional 'agent' parameter in multiagent environments
    • Fixes to vector environment rendering
    • Other fixes to vector environments
    Source code(tar.gz)
    Source code(zip)
  • 2.6.0(Feb 21, 2021)

  • 2.5.1(Feb 11, 2021)

  • 2.5.0(Jan 29, 2021)

  • 2.4.0(Jan 13, 2021)

  • 2.3.1(Jan 10, 2021)

  • 2.3.0(Jan 5, 2021)

  • 2.2.0(Nov 7, 2020)

Owner
Farama Foundation
The Farama Foundation is a host organization for the development of open source reinforcement learning software
Farama Foundation
Manipulation OpenAI Gym environments to simulate robots at the STARS lab

Manipulator Learning This repository contains a set of manipulation environments that are compatible with OpenAI Gym and simulated in pybullet. In par

STARS Laboratory 5 Dec 8, 2022
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 3, 2023
CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

Facebook Research 721 Jan 3, 2023
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 364 Dec 28, 2022
SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

SigOpt 73 Sep 30, 2022
OptaPlanner wrappers for Python. Currently significantly slower than OptaPlanner in Java or Kotlin.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference Scheduling, Job Shop Scheduling, Bin Packing and many more planning problems.

OptaPy 211 Jan 2, 2023
Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library.

SymEngine Python Wrappers Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library. Installation Pip See License section

null 136 Dec 28, 2022
A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Torch-RecHub A Lighting Pytorch Framework for Recommendation Models, Easy-to-use and Easy-to-extend. 安装 pip install torch-rechub 主要特性 scikit-learn风格易用

Mincai Lai 67 Jan 4, 2023
Realtime micro-expression recognition using OpenCV and PyTorch

Micro-expression Recognition Realtime micro-expression recognition from scratch using OpenCV and PyTorch Try it out with a webcam or video using the e

Irfan 35 Dec 5, 2022
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

Dynamic Systems Lab 300 Dec 28, 2022
this is a lite easy to use virtual keyboard project for anyone to use

virtual_Keyboard this is a lite easy to use virtual keyboard project for anyone to use motivation I made this for this year's recruitment for RobEn AA

Mohamed Emad 3 Oct 23, 2021
A collection of easy-to-use, ready-to-use, interesting deep neural network models

Interesting and reproducible research works should be conserved. This repository wraps a collection of deep neural network models into a simple and un

Aria Ghora Prabono 16 Jun 16, 2022
Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

Bogireddy Sai Prasanna Teja Reddy 103 Dec 29, 2022
This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Mask-RCNN on Mycobacterium tuberculosis This is an example of object detection on Mycobacterium Tuberculosis using Mask RCNN. Implement of Mask R-CNN

Jun-En Ding 1 Sep 16, 2021
This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Mask-RCNN on Mycobacterium tuberculosis This is an example of object detection on Mycobacterium Tuberculosis using Mask RCNN. Implement of Mask R-CNN

Jun-En Ding 1 Sep 16, 2021
A micro-game "flappy bird".

1-o-flappy A micro-game "flappy bird". Gameplays The game will be installed at /usr/bin . The name of it is "1-o-flappy". You can type "1-o-flappy" to

null 1 Nov 6, 2021
Plug-n-Play Reinforcement Learning in Python with OpenAI Gym and JAX

coax is built on top of JAX, but it doesn't have an explicit dependence on the jax python package. The reason is that your version of jaxlib will depend on your CUDA version.

null 128 Dec 27, 2022
Robot Servers and Server Manager software for robo-gym

robo-gym-server-modules Robot Servers and Server Manager software for robo-gym. For info on how to use this package please visit the robo-gym website

JR ROBOTICS 4 Aug 16, 2021