Selfplay In MultiPlayer Environments

Related tags

Deep Learning SIMPLE
Overview

Contributors Forks Stargazers Issues MIT License LinkedIn


Logo

Selfplay In MultiPlayer Environments
· Report Bug · Request Feature


Table of Contents

  1. About The Project
  2. Getting Started
  3. Tutorial
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgements


About The Project

SIMPLE Diagram

This project allows you to train AI agents on custom-built multiplayer environments, through self-play reinforcement learning.

It implements Proximal Policy Optimisation (PPO), with a built-in wrapper around the multiplayer environments that handles the loading and action-taking of opponents in the environment. The wrapper delays the reward back to the PPO agent, until all opponents have taken their turn. In essence, it converts the multiplayer environment into a single-player environment that is constantly evolving as new versions of the policy network are added to the network bank.

To learn more, check out the accompanying blog post.

This guide explains how to get started with the repo, add new custom environments and tune the hyperparameters of the system.

Have fun!


Getting Started

To get a local copy up and running, follow these simple steps.

Prerequisites

Install Docker and Docker Compose to make use of the docker-compose.yml file

Installation

  1. Clone the repo
    git clone https://github.com/davidADSP/SIMPLE.git
    cd SIMPLE
  2. Build the image and 'up' the container.
    docker-compose up -d
  3. Choose an environment to install in the container (tictactoe, connect4, sushigo and butterfly are currently implemented)
    bash ./scripts/install_env.sh sushigo

Tutorial

This is a quick tutorial to allow you to start using the two entrypoints into the codebase: test.py and train.py.

TODO - I'll be adding more substantial documentation for both of these entrypoints in due course! For now, descriptions of each command line argument can be found at the bottom of the files themselves.


Quickstart

test.py

This entrypoint allows you to play against a trained AI, pit two AIs against eachother or play against a baseline random model.

For example, try the following command to play against a baseline random model in the Sushi Go environment.

docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo 

train.py

This entrypoint allows you to start training the AI using selfplay PPO. The underlying PPO engine is from the Stable Baselines package.

For example, you can start training the agent to learn how to play SushiGo with the following command:

docker-compose exec app python3 train.py -r -e sushigo 

After 30 or 40 iterations the process should have achieved above the default threshold score of 0.2 and will output a new best_model.zip to the /zoo/sushigo folder.

Training runs until you kill the process manually (e.g. with Ctrl-C), so do that now.

You can now use the test.py entrypoint to play 100 games silently between the current best_model.zip and the random baselines model as follows:

docker-compose exec app python3 test.py -g 100 -a best_model base base -e sushigo 

You should see that the best_model scores better than the two baseline model opponents.

Played 100 games: {'best_model_btkce': 31.0, 'base_sajsi': -15.5, 'base_poqaj': -15.5}

You can continue training the agent by dropping the -r reset flag from the train.py entrypoint arguments - it will just pick up from where it left off.

docker-compose exec app python3 train.py -e sushigo 

Congratulations, you've just completed one training cycle for the game Sushi Go! The PPO agent will now have to work out a way to beat the model it has just created...


Tensorboard

To monitor training, you can start Tensorboard with the following command:

bash scripts/tensorboard.sh

Navigate to localhost:6006 in a browser to view the output.

In the /zoo/pretrained/ folder there is a pre-trained //best_model.zip for each game, that can be copied up a directory (e.g. to /zoo/sushigo/best_model.zip) if you want to test playing against a pre-trained agent right away.


Custom Environments

You can add a new environment by copying and editing an existing environment in the /environments/ folder.

For the environment to work with the SIMPLE self-play wrapper, the class must contain the following methods (expanding on the standard methods from the OpenAI Gym framework):

__init__

In the initiation method, you need to define the usual action_space and observation_space, as well as two additional variables:

  • n_players - the number of players in the game
  • current_player_num - an integer that tracks which player is currently active  

step

The step method accepts an action from the current active player and performs the necessary steps to update the game environment. It should also it should update the current_player_num to the next player, and check to see if an end state of the game has been reached.

reset

The reset method is called to reset the game to the starting state, ready to accept the first action.

render

The render function is called to output a visual or human readable summary of the current game state to the log file.

observation

The observation function returns a numpy array that can be fed as input to the PPO policy network. It should return a numeric representation of the current game state, from the perspective of the current player, where each element of the array is in the range [-1,1].

legal_actions

The legal_actions function returns a numpy vector of the same length as the action space, where 1 indicates that the action is valid and 0 indicates that the action is invalid.

Please refer to existing environments for examples of how to implement each method.

You will also need to add the environment to the two functions in /utils/register.py - follow the existing examples of environments for the structure.


Parallelisation

The training process can be parallelised using MPI across multiple cores.

For example to run 10 parallel threads that contribute games to the current iteration, you can simply run:

docker-compose exec app mpirun -np 10 python3 train.py -e sushigo 

Roadmap

See the open issues for a list of proposed features (and known issues).


Contributing

Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the GPL-3.0. See LICENSE for more information.


Contact

David Foster - @davidADSP - [email protected]

Project Link: https://github.com/davidADSP/SIMPLE


Acknowledgements

There are many repositories and blogs that have helped me to put together this repository. One that deserves particular acknowledgement is David's Ha's Slime Volleyball Gym, that also implements multi-agent reinforcement learning. It has helped to me understand how to adapt the callback function to a self-play setting and also to how to implement MPI so that the codebase can be highly parallelised. Definitely worth checking out!


Comments
  • flamme rouge game initial version

    flamme rouge game initial version

    Implementation of the "flamme rouge" game (https://www.ultraboardgames.com/flamme-rouge/game-rules.php), with "peleton" extension, for 5 players. Proposed for merge to the source project, if it is relevant for you. The best_model need much more training (only 2 hours on one CPU). Model can be largely improved, I am still learning deep learning...

    @davidADSP : great thanks for sharing this wonderful project !

    opened by zorgluf 6
  • AttributeError: module 'contextlib' has no attribute 'nullcontext'

    AttributeError: module 'contextlib' has no attribute 'nullcontext'

    Exactly as the title says. I followed the steps from the README.

    > sudo docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo 
    /home/selfplay/.local/lib/python3.6/site-packages/ale_py/roms/utils.py:90: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select.
      for external in metadata.entry_points().get(self.group, []):
    Traceback (most recent call last):
      File "test.py", line 13, in <module>
        from stable_baselines import logger
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/__init__.py", line 3, in <module>
        from stable_baselines.a2c import A2C
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/a2c/__init__.py", line 1, in <module>
        from stable_baselines.a2c.a2c import A2C
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/a2c/a2c.py", line 3, in <module>
        import gym
      File "/home/selfplay/.local/lib/python3.6/site-packages/gym/__init__.py", line 13, in <module>
        from gym.envs import make, spec, register
      File "/home/selfplay/.local/lib/python3.6/site-packages/gym/envs/__init__.py", line 10, in <module>
        _load_env_plugins()
      File "/home/selfplay/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 269, in load_env_plugins
        context = contextlib.nullcontext()
    AttributeError: module 'contextlib' has no attribute 'nullcontext'
    

    All output:

    > sudo docker-compose up -d                                                          
    [+] Running 1/1
     ⠿ Container selfplay  Started                                                                                                      0.5s
     > sudo bash ./scripts/install_env.sh sushigo                         
    Defaulting to user installation because normal site-packages is not writeable
    Obtaining file:///app/environments/sushigo
      Preparing metadata (setup.py) ... done
    Requirement already satisfied: gym>=0.9.4 in /home/selfplay/.local/lib/python3.6/site-packages (from sushigo==0.1.0) (0.21.0)
    Requirement already satisfied: numpy>=1.13.0 in /home/selfplay/.local/lib/python3.6/site-packages (from sushigo==0.1.0) (1.19.5)
    Requirement already satisfied: opencv-python>=3.4.2.0 in /home/selfplay/.local/lib/python3.6/site-packages (from sushigo==0.1.0) (4.5.4.58)
    Requirement already satisfied: cloudpickle>=1.2.0 in /home/selfplay/.local/lib/python3.6/site-packages (from gym>=0.9.4->sushigo==0.1.0) (2.0.0)
    Requirement already satisfied: importlib-metadata>=4.8.1 in /home/selfplay/.local/lib/python3.6/site-packages (from gym>=0.9.4->sushigo==0.1.0) (4.8.1)
    Requirement already satisfied: zipp>=0.5 in /home/selfplay/.local/lib/python3.6/site-packages (from importlib-metadata>=4.8.1->gym>=0.9.4->sushigo==0.1.0) (3.6.0)
    Requirement already satisfied: typing-extensions>=3.6.4 in /home/selfplay/.local/lib/python3.6/site-packages (from importlib-metadata>=4.8.1->gym>=0.9.4->sushigo==0.1.0) (3.10.0.2)
    Installing collected packages: sushigo
      Running setup.py develop for sushigo
    Successfully installed sushigo-0.1.0
    > sudo docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo 
    /home/selfplay/.local/lib/python3.6/site-packages/ale_py/roms/utils.py:90: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select.
      for external in metadata.entry_points().get(self.group, []):
    Traceback (most recent call last):
      File "test.py", line 13, in <module>
        from stable_baselines import logger
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/__init__.py", line 3, in <module>
        from stable_baselines.a2c import A2C
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/a2c/__init__.py", line 1, in <module>
        from stable_baselines.a2c.a2c import A2C
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/a2c/a2c.py", line 3, in <module>
        import gym
      File "/home/selfplay/.local/lib/python3.6/site-packages/gym/__init__.py", line 13, in <module>
        from gym.envs import make, spec, register
      File "/home/selfplay/.local/lib/python3.6/site-packages/gym/envs/__init__.py", line 10, in <module>
        _load_env_plugins()
      File "/home/selfplay/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 269, in load_env_plugins
        context = contextlib.nullcontext()
    AttributeError: module 'contextlib' has no attribute 'nullcontext'
    
    opened by jay-tux 4
  • Error with training

    Error with training

    It seems like theres an error when I try to use the module with a custom env, that occurs after the first iter:

    Optimizing...
         pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    Traceback (most recent call last):
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
        return fn(*args)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
        target_list, run_metadata)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
        run_metadata)
    tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1024,2] vs. [1024]
             [[{{node gradients/loss/sub_8_grad/BroadcastGradientArgs}}]]
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "train.py", line 184, in <module>
        cli()
      File "train.py", line 179, in cli
        main(args)
      File "train.py", line 118, in main
        model.learn(total_timesteps=int(1e9), callback=[eval_callback], reset_num_timesteps = False, tb_log_name="tb")
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 297, in learn
        cur_lrmult, sess=self.sess)
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/common/tf_util.py", line 330, in __call__
        results = sess.run(self.outputs_update, feed_dict=feed_dict, **kwargs)[:-1]
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
        run_metadata_ptr)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
        feed_dict_tensor, options, run_metadata)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
        run_metadata)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1024,2] vs. [1024]
             [[node gradients/loss/sub_8_grad/BroadcastGradientArgs (defined at /home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
    
    Original stack trace for 'gradients/loss/sub_8_grad/BroadcastGradientArgs':
      File "train.py", line 184, in <module>
        cli()
      File "train.py", line 179, in cli
        main(args)
      File "train.py", line 82, in main
        model = PPO1.load(os.path.join(model_dir, 'base.zip'), env, **params)
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 947, in load
        model.setup_model()
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 193, in setup_model
        [self.summary, tf_util.flatgrad(total_loss, self.params)] + losses)
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/common/tf_util.py", line 381, in flatgrad
        grads = tf.gradients(loss, var_list)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_impl.py", line 158, in gradients
        unconnected_gradients)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py", line 679, in _GradientsHelper
        lambda: grad_fn(op, *out_grads))
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py", line 350, in _MaybeCompile
        return grad_fn()  # Exit early
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py", line 679, in <lambda>
        lambda: grad_fn(op, *out_grads))
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py", line 1144, in _SubGrad
        SmartBroadcastGradientArgs(x, y, grad))
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py", line 99, in SmartBroadcastGradientArgs
        rx, ry = gen_array_ops.broadcast_gradient_args(sx, sy)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 830, in broadcast_gradient_args
        "BroadcastGradientArgs", s0=s0, s1=s1, name=name)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
        op_def=op_def)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
        return func(*args, **kwargs)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
        attrs, op_def, compute_device)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
        op_def=op_def)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
        self._traceback = tf_stack.extract_stack()
    
    ...which was originally created as op 'loss/sub_8', defined at:
      File "train.py", line 184, in <module>
        cli()
    [elided 2 identical lines from previous traceback]
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 947, in load
        model.setup_model()
      File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 147, in setup_model
        vf_loss = tf.reduce_mean(tf.square(self.policy_pi.value_flat - ret))
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_ops.py", line 899, in binary_op_wrapper
        return func(x, y, name=name)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 11086, in sub
        "Sub", x=x, y=y, name=name)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
        op_def=op_def)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
        return func(*args, **kwargs)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
        attrs, op_def, compute_device)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
        op_def=op_def)
      File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
        self._traceback = tf_stack.extract_stack()
    

    How could this be caused? I defined my action space as a discrete value with 11 possible, and observation as 2 values with 100 discrete values. I repurpused the Tic Tac Toe model, with a few changes below:

    import tensorflow as tf
    tf.get_logger().setLevel('INFO')
    tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
    
    from tensorflow.keras.layers import BatchNormalization, Activation, Flatten, Conv2D, Add, Dense, Dropout
    
    from stable_baselines.common.policies import ActorCriticPolicy
    from stable_baselines.common.distributions import CategoricalProbabilityDistributionType, CategoricalProbabilityDistribution
    
    
    class CustomPolicy(ActorCriticPolicy):
        def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, **kwargs):
            super(CustomPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse, scale=True)
    
            with tf.variable_scope("model", reuse=reuse):
                
                self._policy = policy_head(self.processed_obs)
                self._value_fn, self.q_value = value_head(self.processed_obs)
    
                self._proba_distribution  = CategoricalProbabilityDistribution(self._policy)
    
                
            self._setup_init()
    
        def step(self, obs, state=None, mask=None, deterministic=True):
            if deterministic:
                action, value, neglogp = self.sess.run([self.deterministic_action, self.value_flat, self.neglogp],
                                                       {self.obs_ph: obs})
            else:
                action, value, neglogp = self.sess.run([self.action, self.value_flat, self.neglogp],
                                                       {self.obs_ph: obs})
            return action, value[0], self.initial_state, neglogp
    
        def proba_step(self, obs, state=None, mask=None):
            return self.sess.run(self.policy_proba, {self.obs_ph: obs})
    
        def value(self, obs, state=None, mask=None):
            return self.sess.run(self.value_flat, {self.obs_ph: obs})
    
    
    
    def value_head(y):
        vf = dense(y, 2, batch_norm = False, activation = 'tanh', name='vf')
        q = dense(y, 11, batch_norm = False, activation = 'tanh', name='q')
        return vf, q
    
    
    def policy_head(y):
        policy = dense(y, 11, batch_norm = False, activation = None, name='pi')
        return policy
    
    
    def resnet_extractor(y, **kwargs):
    
        y = convolutional(y, 32, 3)
        y = residual(y, 32, 3)
    
        return y
    
    
    
    def convolutional(y, filters, kernel_size):
        y = Conv2D(filters, kernel_size=kernel_size, strides=1, padding='same')(y)
        y = BatchNormalization(momentum = 0.9)(y)
        y = Activation('relu')(y)
        return y
    
    def residual(y, filters, kernel_size):
        shortcut = y
    
        y = Conv2D(filters, kernel_size=kernel_size, strides=1, padding='same')(y)
        y = BatchNormalization(momentum = 0.9)(y)
        y = Activation('relu')(y)
    
        y = Conv2D(filters, kernel_size=kernel_size, strides=1, padding='same')(y)
        y = BatchNormalization(momentum = 0.9)(y)
        y = Add()([shortcut, y])
        y = Activation('relu')(y)
    
        return y
    
    
    def dense(y, filters, batch_norm = True, activation = 'relu', name = None):
    
        if batch_norm or activation:
            y = Dense(filters)(y)
        else:
            y = Dense(filters, name = name)(y)
        
        if batch_norm:
            if activation:
                y = BatchNormalization(momentum = 0.9)(y)
            else:
                y = BatchNormalization(momentum = 0.9, name = name)(y)
    
        if activation:
            y = Activation(activation, name = name)(y)
        
        return y
    
    opened by neelr 2
  • "Permissions not granted"

    Following the readme. After bringing up the docker image and installing the sushigo env, I try to run test.py and encounter this error:

    > docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo
    Logging to logs
    Saving base.zip PPO model...
    Permissions not granted on zoo/sushigo/...
    
    opened by aiannacc 2
  • Training and eval env are not of the same type

    Training and eval env are not of the same type

    When executing docker-compose exec app python3 train.py I'm getting a warning message:

    /home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/common/callbacks.py:287: UserWarning: Training and eval env are not of the same type<SelfPlayEnv instance> != <stable_baselines.common.vec_env.dummy_vec_env.DummyVecEnv object at 0x7f1db36e9c50>
      "{} != {}".format(self.training_env, self.eval_env))
    

    Besides that the training works fine. Is it something I should worry about, or it's normal, and I should ignore it?

    opened by Gieted 2
  • Adding other dependencies

    Adding other dependencies

    Our environment depends on the (external) library pycatan, which I tried to install by adding to both the requirements.txt and the Dockerfile, yet I always get the error that the docker can't find the library:

    > sudo docker-compose exec app python3 train.py -r -e catan
    Logging to logs
    
    Setting up the selfplay training environment opponents...
    Traceback (most recent call last):
      File "/app/utils/register.py", line 24, in get_environment
        from catan.envs.catan import CatanEnv
      File "/app/environments/catan/catan/envs/__init__.py", line 1, in <module>
        from catan.envs.catan import CatanEnv
      File "/app/environments/catan/catan/envs/catan.py", line 3, in <module>
        from pycatan import Resource, Coords, Path, Intersection, Hex
    ModuleNotFoundError: No module named 'pycatan'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "train.py", line 188, in <module>
        cli()
      File "train.py", line 183, in cli
        main(args)
      File "train.py", line 58, in main
        base_env = get_environment(args.env_name)
      File "/app/utils/register.py", line 32, in get_environment
        raise Exception(f'Install the environment first using: \nbash scripts/install_env.sh {env_name}\nAlso ensure the environment is added to /utils/register.py')
    Exception: Install the environment first using: 
    bash scripts/install_env.sh catan
    Also ensure the environment is added to /utils/register.py
    

    However, the environment is in /utils/register.py, and the install script has run:

    > sudo docker-compose exec app pip3 install pycatan        
    Defaulting to user installation because normal site-packages is not writeable
    Requirement already satisfied: pycatan in /home/selfplay/.local/lib/python3.6/site-packages (0.13)
    Requirement already satisfied: quotequail in /home/selfplay/.local/lib/python3.6/site-packages (from pycatan) (0.2.3)
    

    The setup clearly states that the requirement is satisfied, yet I can't seem to get it loaded?

    opened by jay-tux 1
  • Permissions not granted on zoo/sushigo/...

    Permissions not granted on zoo/sushigo/...

    When running docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo I get the error Logging to logs Saving base.zip PPO model... Permissions not granted on zoo/sushigo/... ERROR: 1

    opened by Timbobo16 1
  • Permission Denied on environment install

    Permission Denied on environment install

    Hi all,

    Been trying to run the install steps as indicated in the Readme. I have encountered issues on the step 3 of the install process.

    On bash ./scripts/install_env.sh sushigo I get the message Defaulting to user installation because normal site-packages is not writeable. The scripts keeps on running but a few lines later I get the following issue:

    Installing collected packages: sushigo
      Running setup.py develop for sushigo
        ERROR: Command errored out with exit status 1:
         command: /usr/bin/python3 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/app/environments/sushigo/setup.py'"'"'; __file__='"'"'/app/environments/sushigo/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps --user --prefix=
             cwd: /app/environments/sushigo/
        Complete output (4 lines):
        running develop
        running egg_info
        creating sushigo.egg-info
        error: could not create 'sushigo.egg-info': Permission denied
        ----------------------------------------
    ERROR: Command errored out with exit status 1: /usr/bin/python3 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/app/environments/sushigo/setup.py'"'"'; __file__='"'"'/app/environments/sushigo/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps --user --prefix= Check the logs for full command output.
    

    Anyone else experienced this ?

    opened by sovnheim 0
  • Geschenkt

    Geschenkt

    • Adding new game - Geschenkt (aka No Thanks https://boardgamearena.com/gamepanel?game=nothanks)
    • adding functionality to manually update the game state
    • adding recommendation as a flag so that it works even when all humans are playing
    • changing continue game logic to ensure it still works when the player doesn't change after an AI turn
    opened by davidADSP 0
  • Unable to install the enviroment due connection errors

    Unable to install the enviroment due connection errors

    I am unable to install the enviroment needed to start testing the SIMPLE features. I need to solve this problem before starting to develop my own custom board game project.

    imagen

    opened by JavierChicoOfc 1
  • Question: Does SIMPLE support board games with simultaneous action selection?

    Question: Does SIMPLE support board games with simultaneous action selection?

    Hi,

    I am new to ML but have been researching MuZero. Thanks to your article i found SIMPLE. My question is: is it possible to create a policy network that takes into account that both players will choose a move to make ? Because the next state relies on both players choices there is a bit of nuance in the policy network where it will have to account for the opponents and their might be a dependency between them.

    I am not sure if it is built out of the box or if there is any research in this regard.

    Thanks

    opened by moscoso 0
  • Exporting to TF SavedModel/TFLite

    Exporting to TF SavedModel/TFLite

    I'm trying to export the resulting model to TFLite so I can run inference on another device, but I'm hitting some issues. I found instructions on how to export a model in the Stable Baselines documentation and tried adapting it for PPO1 instead of PPO2, however when I try and load the resulting SavedModel I get an exception about the Tensor not existing.

    Here's the code:

      ppo_model = load_model(env, 'best_model.zip')
    
      tf.saved_model.simple_save(ppo_model.sess, "TEST_OUTPUT", inputs={"obs": ppo_model.policy_pi.obs_ph},
                                       outputs={"action": ppo_model.policy_pi._policy_proba})
    
      converter = tf.lite.TFLiteConverter.from_saved_model("TEST_OUTPUT")
      tflite_model = converter.convert()
    

    And the full error message: KeyError: "The name 'input/Ob:0' refers to a Tensor which does not exist. The operation, 'input/Ob', does not exist in the graph."

    I've verified that ppo_model is being loaded correctly by running the inference (using ppo_model.action_probability()), so I don't believe there's an issue there. The SavedModel directory does get created on the tf.saved_model.simple_save step, however I believe it may not be a complete export as the size is very small.

    I'm rather new to the ML side of things, so there might be something obvious that I'm missing, so any help would be greatly appreciated!

    Thanks for putting together this great library!

    opened by maciel310 0
  • Training Flamme Rouge model for 5 players stops

    Training Flamme Rouge model for 5 players stops

    When I try to train my frouge model after setting the number of players to 5, it can start the training. However, when it starts optimizing... , it just stops the traing and go to the command prompt. Any ideas on how this can be fixed would be a great help for me. THX

    opened by celfcelfcelfcelf 5
  • add support for apple m1 chip

    add support for apple m1 chip

    The current setup in docker-compose.yml doesn't allow the environment setup on a mac machine with M1 chip. This MR will will take care of this problem.

    opened by ROZBEH 0
Owner
null
Learning to Simulate Dynamic Environments with GameGAN (CVPR 2020)

Learning to Simulate Dynamic Environments with GameGAN PyTorch code for GameGAN Learning to Simulate Dynamic Environments with GameGAN Seung Wook Kim,

null 199 Dec 26, 2022
Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. Now with tensorflow 1.0 support. Evaluation usa

Marcel R. 349 Aug 6, 2022
[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning This is the Tensorflow implementation of ICLR 2021 paper Rank the Episo

Daochen Zha 48 Nov 21, 2022
Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments Paper: arXiv (ICRA 2021) Video : https://youtu.be/CC

Sachini Herath 68 Jan 3, 2023
MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

Felix Wimbauer 494 Jan 6, 2023
gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI Gym toolkit.

Robin Henry 99 Dec 12, 2022
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 49 Nov 28, 2022
Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators

Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators. It's also a suite of learning algorithms to train agents to operate in these environments (PPO, SAC, evolutionary strategy, and direct trajectory optimization are implemented).

Google 1.5k Jan 2, 2023
PyTorch implementations of deep reinforcement learning algorithms and environments

Deep Reinforcement Learning Algorithms with PyTorch This repository contains PyTorch implementations of deep reinforcement learning algorithms and env

Petros Christodoulou 4.7k Jan 4, 2023
Ranking Models in Unlabeled New Environments (iccv21)

Ranking Models in Unlabeled New Environments Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch 1.7.0 + torchivision 0.8.1

null 14 Dec 17, 2021
Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Cross-Quality Labeled Faces in the Wild (XQLFW) Here, we release the database, evaluation protocol and code for the following paper: Cross Quality LFW

Martin Knoche 10 Dec 12, 2022
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

Dynamic Systems Lab 300 Dec 28, 2022
CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

Facebook Research 721 Jan 3, 2023
PyTorch implementation of Memory-based semantic segmentation for off-road unstructured natural environments.

MemSeg: Memory-based semantic segmentation for off-road unstructured natural environments Introduction This repository is a PyTorch implementation of

null 11 Nov 28, 2022
A user-friendly research and development tool built to standardize RL competency assessment for custom agents and environments.

Built with ❤️ by Sam Showalter Contents Overview Installation Dependencies Usage Scripts Standard Execution Environment Development Environment Benchm

SRI-AIC 1 Nov 18, 2021
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 2, 2022
Conflict-aware Inference of Python Compatible Runtime Environments with Domain Knowledge Graph, ICSE 2022

PyCRE Conflict-aware Inference of Python Compatible Runtime Environments with Domain Knowledge Graph, ICSE 2022 Dependencies This project is developed

Websoft@NJU 7 May 6, 2022
Manipulation OpenAI Gym environments to simulate robots at the STARS lab

Manipulator Learning This repository contains a set of manipulation environments that are compatible with OpenAI Gym and simulated in pybullet. In par

STARS Laboratory 5 Dec 8, 2022
Get a Grip! - A robotic system for remote clinical environments.

Get a Grip! Within clinical environments, sterilization is an essential procedure for disinfecting surgical and medical instruments. For our engineeri

Jay Sharma 1 Jan 5, 2022