Selfplay In MultiPlayer Environments

Last update: Jan 8, 2023

Related tags

Deep Learning SIMPLE

Overview

Selfplay In MultiPlayer Environments
· Report Bug · Request Feature

About The Project
Getting Started
- Prerequisites
- Installation
Tutorial

Quickstart
Tensorboard
Custom Environments
Parallelisation

Roadmap
Contributing
License
Contact
Acknowledgements

About The Project

This project allows you to train AI agents on custom-built multiplayer environments, through self-play reinforcement learning.

It implements Proximal Policy Optimisation (PPO), with a built-in wrapper around the multiplayer environments that handles the loading and action-taking of opponents in the environment. The wrapper delays the reward back to the PPO agent, until all opponents have taken their turn. In essence, it converts the multiplayer environment into a single-player environment that is constantly evolving as new versions of the policy network are added to the network bank.

To learn more, check out the accompanying blog post.

This guide explains how to get started with the repo, add new custom environments and tune the hyperparameters of the system.

Have fun!

Getting Started

To get a local copy up and running, follow these simple steps.

Prerequisites

Install Docker and Docker Compose to make use of the docker-compose.yml file

Installation

Clone the repo

git clone https://github.com/davidADSP/SIMPLE.git
cd SIMPLE

Build the image and 'up' the container.
```
docker-compose up -d
```
Choose an environment to install in the container (tictactoe, connect4, sushigo and butterfly are currently implemented)
```
bash ./scripts/install_env.sh sushigo
```

Tutorial

This is a quick tutorial to allow you to start using the two entrypoints into the codebase: test.py and train.py.

TODO - I'll be adding more substantial documentation for both of these entrypoints in due course! For now, descriptions of each command line argument can be found at the bottom of the files themselves.

Quickstart

`test.py`

This entrypoint allows you to play against a trained AI, pit two AIs against eachother or play against a baseline random model.

For example, try the following command to play against a baseline random model in the Sushi Go environment.

docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo

`train.py`

This entrypoint allows you to start training the AI using selfplay PPO. The underlying PPO engine is from the Stable Baselines package.

For example, you can start training the agent to learn how to play SushiGo with the following command:

docker-compose exec app python3 train.py -r -e sushigo

After 30 or 40 iterations the process should have achieved above the default threshold score of 0.2 and will output a new best_model.zip to the /zoo/sushigo folder.

Training runs until you kill the process manually (e.g. with Ctrl-C), so do that now.

You can now use the test.py entrypoint to play 100 games silently between the current best_model.zip and the random baselines model as follows:

docker-compose exec app python3 test.py -g 100 -a best_model base base -e sushigo

You should see that the best_model scores better than the two baseline model opponents.

Played 100 games: {'best_model_btkce': 31.0, 'base_sajsi': -15.5, 'base_poqaj': -15.5}

You can continue training the agent by dropping the -r reset flag from the train.py entrypoint arguments - it will just pick up from where it left off.

docker-compose exec app python3 train.py -e sushigo

Congratulations, you've just completed one training cycle for the game Sushi Go! The PPO agent will now have to work out a way to beat the model it has just created...

Tensorboard

To monitor training, you can start Tensorboard with the following command:

bash scripts/tensorboard.sh

Navigate to localhost:6006 in a browser to view the output.

In the /zoo/pretrained/ folder there is a pre-trained //best_model.zip for each game, that can be copied up a directory (e.g. to /zoo/sushigo/best_model.zip) if you want to test playing against a pre-trained agent right away.

Custom Environments

You can add a new environment by copying and editing an existing environment in the /environments/ folder.

For the environment to work with the SIMPLE self-play wrapper, the class must contain the following methods (expanding on the standard methods from the OpenAI Gym framework):

__init__

In the initiation method, you need to define the usual action_space and observation_space, as well as two additional variables:

n_players - the number of players in the game
current_player_num - an integer that tracks which player is currently active

step

The step method accepts an action from the current active player and performs the necessary steps to update the game environment. It should also it should update the current_player_num to the next player, and check to see if an end state of the game has been reached.

reset

The reset method is called to reset the game to the starting state, ready to accept the first action.

render

The render function is called to output a visual or human readable summary of the current game state to the log file.

observation

The observation function returns a numpy array that can be fed as input to the PPO policy network. It should return a numeric representation of the current game state, from the perspective of the current player, where each element of the array is in the range [-1,1].

legal_actions

The legal_actions function returns a numpy vector of the same length as the action space, where 1 indicates that the action is valid and 0 indicates that the action is invalid.

Please refer to existing environments for examples of how to implement each method.

You will also need to add the environment to the two functions in /utils/register.py - follow the existing examples of environments for the structure.

Parallelisation

The training process can be parallelised using MPI across multiple cores.

For example to run 10 parallel threads that contribute games to the current iteration, you can simply run:

docker-compose exec app mpirun -np 10 python3 train.py -e sushigo

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the GPL-3.0. See LICENSE for more information.

Contact

David Foster - @davidADSP - [email protected]

Project Link: https://github.com/davidADSP/SIMPLE

Acknowledgements

There are many repositories and blogs that have helped me to put together this repository. One that deserves particular acknowledgement is David's Ha's Slime Volleyball Gym, that also implements multi-agent reinforcement learning. It has helped to me understand how to adapt the callback function to a self-play setting and also to how to implement MPI so that the codebase can be highly parallelised. Definitely worth checking out!

David Ha - Slime Volleyball Gym

Comments

flamme rouge game initial version

Implementation of the "flamme rouge" game (https://www.ultraboardgames.com/flamme-rouge/game-rules.php), with "peleton" extension, for 5 players. Proposed for merge to the source project, if it is relevant for you. The best_model need much more training (only 2 hours on one CPU). Model can be largely improved, I am still learning deep learning...

@davidADSP : great thanks for sharing this wonderful project !

opened by zorgluf 6

AttributeError: module 'contextlib' has no attribute 'nullcontext'

Exactly as the title says. I followed the steps from the README.

> sudo docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo 
/home/selfplay/.local/lib/python3.6/site-packages/ale_py/roms/utils.py:90: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select.
  for external in metadata.entry_points().get(self.group, []):
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    from stable_baselines import logger
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/__init__.py", line 3, in <module>
    from stable_baselines.a2c import A2C
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/a2c/__init__.py", line 1, in <module>
    from stable_baselines.a2c.a2c import A2C
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/a2c/a2c.py", line 3, in <module>
    import gym
  File "/home/selfplay/.local/lib/python3.6/site-packages/gym/__init__.py", line 13, in <module>
    from gym.envs import make, spec, register
  File "/home/selfplay/.local/lib/python3.6/site-packages/gym/envs/__init__.py", line 10, in <module>
    _load_env_plugins()
  File "/home/selfplay/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 269, in load_env_plugins
    context = contextlib.nullcontext()
AttributeError: module 'contextlib' has no attribute 'nullcontext'

All output:

> sudo docker-compose up -d                                                          
[+] Running 1/1
 ⠿ Container selfplay  Started                                                                                                      0.5s
 > sudo bash ./scripts/install_env.sh sushigo                         
Defaulting to user installation because normal site-packages is not writeable
Obtaining file:///app/environments/sushigo
  Preparing metadata (setup.py) ... done
Requirement already satisfied: gym>=0.9.4 in /home/selfplay/.local/lib/python3.6/site-packages (from sushigo==0.1.0) (0.21.0)
Requirement already satisfied: numpy>=1.13.0 in /home/selfplay/.local/lib/python3.6/site-packages (from sushigo==0.1.0) (1.19.5)
Requirement already satisfied: opencv-python>=3.4.2.0 in /home/selfplay/.local/lib/python3.6/site-packages (from sushigo==0.1.0) (4.5.4.58)
Requirement already satisfied: cloudpickle>=1.2.0 in /home/selfplay/.local/lib/python3.6/site-packages (from gym>=0.9.4->sushigo==0.1.0) (2.0.0)
Requirement already satisfied: importlib-metadata>=4.8.1 in /home/selfplay/.local/lib/python3.6/site-packages (from gym>=0.9.4->sushigo==0.1.0) (4.8.1)
Requirement already satisfied: zipp>=0.5 in /home/selfplay/.local/lib/python3.6/site-packages (from importlib-metadata>=4.8.1->gym>=0.9.4->sushigo==0.1.0) (3.6.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /home/selfplay/.local/lib/python3.6/site-packages (from importlib-metadata>=4.8.1->gym>=0.9.4->sushigo==0.1.0) (3.10.0.2)
Installing collected packages: sushigo
  Running setup.py develop for sushigo
Successfully installed sushigo-0.1.0
> sudo docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo 
/home/selfplay/.local/lib/python3.6/site-packages/ale_py/roms/utils.py:90: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select.
  for external in metadata.entry_points().get(self.group, []):
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    from stable_baselines import logger
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/__init__.py", line 3, in <module>
    from stable_baselines.a2c import A2C
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/a2c/__init__.py", line 1, in <module>
    from stable_baselines.a2c.a2c import A2C
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/a2c/a2c.py", line 3, in <module>
    import gym
  File "/home/selfplay/.local/lib/python3.6/site-packages/gym/__init__.py", line 13, in <module>
    from gym.envs import make, spec, register
  File "/home/selfplay/.local/lib/python3.6/site-packages/gym/envs/__init__.py", line 10, in <module>
    _load_env_plugins()
  File "/home/selfplay/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 269, in load_env_plugins
    context = contextlib.nullcontext()
AttributeError: module 'contextlib' has no attribute 'nullcontext'

opened by jay-tux 4

Error with training

It seems like theres an error when I try to use the module with a custom env, that occurs after the first iter:

Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
Traceback (most recent call last):
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1024,2] vs. [1024]
         [[{{node gradients/loss/sub_8_grad/BroadcastGradientArgs}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 184, in <module>
    cli()
  File "train.py", line 179, in cli
    main(args)
  File "train.py", line 118, in main
    model.learn(total_timesteps=int(1e9), callback=[eval_callback], reset_num_timesteps = False, tb_log_name="tb")
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 297, in learn
    cur_lrmult, sess=self.sess)
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/common/tf_util.py", line 330, in __call__
    results = sess.run(self.outputs_update, feed_dict=feed_dict, **kwargs)[:-1]
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1024,2] vs. [1024]
         [[node gradients/loss/sub_8_grad/BroadcastGradientArgs (defined at /home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'gradients/loss/sub_8_grad/BroadcastGradientArgs':
  File "train.py", line 184, in <module>
    cli()
  File "train.py", line 179, in cli
    main(args)
  File "train.py", line 82, in main
    model = PPO1.load(os.path.join(model_dir, 'base.zip'), env, **params)
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 947, in load
    model.setup_model()
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 193, in setup_model
    [self.summary, tf_util.flatgrad(total_loss, self.params)] + losses)
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/common/tf_util.py", line 381, in flatgrad
    grads = tf.gradients(loss, var_list)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_impl.py", line 158, in gradients
    unconnected_gradients)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py", line 679, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py", line 350, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py", line 679, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py", line 1144, in _SubGrad
    SmartBroadcastGradientArgs(x, y, grad))
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py", line 99, in SmartBroadcastGradientArgs
    rx, ry = gen_array_ops.broadcast_gradient_args(sx, sy)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 830, in broadcast_gradient_args
    "BroadcastGradientArgs", s0=s0, s1=s1, name=name)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'loss/sub_8', defined at:
  File "train.py", line 184, in <module>
    cli()
[elided 2 identical lines from previous traceback]
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/common/base_class.py", line 947, in load
    model.setup_model()
  File "/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 147, in setup_model
    vf_loss = tf.reduce_mean(tf.square(self.policy_pi.value_flat - ret))
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_ops.py", line 899, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 11086, in sub
    "Sub", x=x, y=y, name=name)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/selfplay/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

How could this be caused? I defined my action space as a discrete value with 11 possible, and observation as 2 values with 100 discrete values. I repurpused the Tic Tac Toe model, with a few changes below:

import tensorflow as tf
tf.get_logger().setLevel('INFO')
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

from tensorflow.keras.layers import BatchNormalization, Activation, Flatten, Conv2D, Add, Dense, Dropout

from stable_baselines.common.policies import ActorCriticPolicy
from stable_baselines.common.distributions import CategoricalProbabilityDistributionType, CategoricalProbabilityDistribution


class CustomPolicy(ActorCriticPolicy):
    def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, **kwargs):
        super(CustomPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse, scale=True)

        with tf.variable_scope("model", reuse=reuse):
            
            self._policy = policy_head(self.processed_obs)
            self._value_fn, self.q_value = value_head(self.processed_obs)

            self._proba_distribution  = CategoricalProbabilityDistribution(self._policy)

            
        self._setup_init()

    def step(self, obs, state=None, mask=None, deterministic=True):
        if deterministic:
            action, value, neglogp = self.sess.run([self.deterministic_action, self.value_flat, self.neglogp],
                                                   {self.obs_ph: obs})
        else:
            action, value, neglogp = self.sess.run([self.action, self.value_flat, self.neglogp],
                                                   {self.obs_ph: obs})
        return action, value[0], self.initial_state, neglogp

    def proba_step(self, obs, state=None, mask=None):
        return self.sess.run(self.policy_proba, {self.obs_ph: obs})

    def value(self, obs, state=None, mask=None):
        return self.sess.run(self.value_flat, {self.obs_ph: obs})



def value_head(y):
    vf = dense(y, 2, batch_norm = False, activation = 'tanh', name='vf')
    q = dense(y, 11, batch_norm = False, activation = 'tanh', name='q')
    return vf, q


def policy_head(y):
    policy = dense(y, 11, batch_norm = False, activation = None, name='pi')
    return policy


def resnet_extractor(y, **kwargs):

    y = convolutional(y, 32, 3)
    y = residual(y, 32, 3)

    return y



def convolutional(y, filters, kernel_size):
    y = Conv2D(filters, kernel_size=kernel_size, strides=1, padding='same')(y)
    y = BatchNormalization(momentum = 0.9)(y)
    y = Activation('relu')(y)
    return y

def residual(y, filters, kernel_size):
    shortcut = y

    y = Conv2D(filters, kernel_size=kernel_size, strides=1, padding='same')(y)
    y = BatchNormalization(momentum = 0.9)(y)
    y = Activation('relu')(y)

    y = Conv2D(filters, kernel_size=kernel_size, strides=1, padding='same')(y)
    y = BatchNormalization(momentum = 0.9)(y)
    y = Add()([shortcut, y])
    y = Activation('relu')(y)

    return y


def dense(y, filters, batch_norm = True, activation = 'relu', name = None):

    if batch_norm or activation:
        y = Dense(filters)(y)
    else:
        y = Dense(filters, name = name)(y)
    
    if batch_norm:
        if activation:
            y = BatchNormalization(momentum = 0.9)(y)
        else:
            y = BatchNormalization(momentum = 0.9, name = name)(y)

    if activation:
        y = Activation(activation, name = name)(y)
    
    return y

opened by neelr 2

"Permissions not granted"
Following the readme. After bringing up the docker image and installing the sushigo env, I try to run test.py and encounter this error:

> docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo Logging to logs Saving base.zip PPO model... Permissions not granted on zoo/sushigo/...
opened by aiannacc 2

Training and eval env are not of the same type

When executing docker-compose exec app python3 train.py I'm getting a warning message:

/home/selfplay/.local/lib/python3.6/site-packages/stable_baselines/common/callbacks.py:287: UserWarning: Training and eval env are not of the same type<SelfPlayEnv instance> != <stable_baselines.common.vec_env.dummy_vec_env.DummyVecEnv object at 0x7f1db36e9c50>
  "{} != {}".format(self.training_env, self.eval_env))

Besides that the training works fine. Is it something I should worry about, or it's normal, and I should ignore it?

opened by Gieted 2

Adding other dependencies

Our environment depends on the (external) library pycatan, which I tried to install by adding to both the requirements.txt and the Dockerfile, yet I always get the error that the docker can't find the library:

> sudo docker-compose exec app python3 train.py -r -e catan
Logging to logs

Setting up the selfplay training environment opponents...
Traceback (most recent call last):
  File "/app/utils/register.py", line 24, in get_environment
    from catan.envs.catan import CatanEnv
  File "/app/environments/catan/catan/envs/__init__.py", line 1, in <module>
    from catan.envs.catan import CatanEnv
  File "/app/environments/catan/catan/envs/catan.py", line 3, in <module>
    from pycatan import Resource, Coords, Path, Intersection, Hex
ModuleNotFoundError: No module named 'pycatan'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 188, in <module>
    cli()
  File "train.py", line 183, in cli
    main(args)
  File "train.py", line 58, in main
    base_env = get_environment(args.env_name)
  File "/app/utils/register.py", line 32, in get_environment
    raise Exception(f'Install the environment first using: \nbash scripts/install_env.sh {env_name}\nAlso ensure the environment is added to /utils/register.py')
Exception: Install the environment first using: 
bash scripts/install_env.sh catan
Also ensure the environment is added to /utils/register.py

However, the environment is in /utils/register.py, and the install script has run:

> sudo docker-compose exec app pip3 install pycatan        
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pycatan in /home/selfplay/.local/lib/python3.6/site-packages (0.13)
Requirement already satisfied: quotequail in /home/selfplay/.local/lib/python3.6/site-packages (from pycatan) (0.2.3)

The setup clearly states that the requirement is satisfied, yet I can't seem to get it loaded?

opened by jay-tux 1

Permissions not granted on zoo/sushigo/...

When running docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo I get the error Logging to logs Saving base.zip PPO model... Permissions not granted on zoo/sushigo/... ERROR: 1

opened by Timbobo16 1

Permission Denied on environment install

Hi all,

Been trying to run the install steps as indicated in the Readme. I have encountered issues on the step 3 of the install process.

On bash ./scripts/install_env.sh sushigo I get the message Defaulting to user installation because normal site-packages is not writeable. The scripts keeps on running but a few lines later I get the following issue:

Installing collected packages: sushigo
  Running setup.py develop for sushigo
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/app/environments/sushigo/setup.py'"'"'; __file__='"'"'/app/environments/sushigo/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps --user --prefix=
         cwd: /app/environments/sushigo/
    Complete output (4 lines):
    running develop
    running egg_info
    creating sushigo.egg-info
    error: could not create 'sushigo.egg-info': Permission denied
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/app/environments/sushigo/setup.py'"'"'; __file__='"'"'/app/environments/sushigo/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps --user --prefix= Check the logs for full command output.

Anyone else experienced this ?

opened by sovnheim 0

Geschenkt
Adding new game - Geschenkt (aka No Thanks https://boardgamearena.com/gamepanel?game=nothanks)

adding functionality to manually update the game state

adding recommendation as a flag so that it works even when all humans are playing

changing continue game logic to ensure it still works when the player doesn't change after an AI turn
opened by davidADSP 0
Unable to install the enviroment due connection errors

I am unable to install the enviroment needed to start testing the SIMPLE features. I need to solve this problem before starting to develop my own custom board game project.

opened by JavierChicoOfc 1
Question: Does SIMPLE support board games with simultaneous action selection?

Hi,

I am new to ML but have been researching MuZero. Thanks to your article i found SIMPLE. My question is: is it possible to create a policy network that takes into account that both players will choose a move to make ? Because the next state relies on both players choices there is a bit of nuance in the policy network where it will have to account for the opponents and their might be a dependency between them.

I am not sure if it is built out of the box or if there is any research in this regard.

Thanks

opened by moscoso 0
Exporting to TF SavedModel/TFLite
I'm trying to export the resulting model to TFLite so I can run inference on another device, but I'm hitting some issues. I found instructions on how to export a model in the Stable Baselines documentation and tried adapting it for PPO1 instead of PPO2, however when I try and load the resulting SavedModel I get an exception about the Tensor not existing.

Here's the code:

ppo_model = load_model(env, 'best_model.zip') tf.saved_model.simple_save(ppo_model.sess, "TEST_OUTPUT", inputs={"obs": ppo_model.policy_pi.obs_ph}, outputs={"action": ppo_model.policy_pi._policy_proba}) converter = tf.lite.TFLiteConverter.from_saved_model("TEST_OUTPUT") tflite_model = converter.convert()

And the full error message: KeyError: "The name 'input/Ob:0' refers to a Tensor which does not exist. The operation, 'input/Ob', does not exist in the graph."

I've verified that ppo_model is being loaded correctly by running the inference (using ppo_model.action_probability()), so I don't believe there's an issue there. The SavedModel directory does get created on the tf.saved_model.simple_save step, however I believe it may not be a complete export as the size is very small.

I'm rather new to the ML side of things, so there might be something obvious that I'm missing, so any help would be greatly appreciated!

Thanks for putting together this great library!
opened by maciel310 0
Training Flamme Rouge model for 5 players stops

When I try to train my frouge model after setting the number of players to 5, it can start the training. However, when it starts optimizing... , it just stops the traing and go to the command prompt. Any ideas on how this can be fixed would be a great help for me. THX

opened by celfcelfcelfcelf 5
add support for apple m1 chip

The current setup in docker-compose.yml doesn't allow the environment setup on a mac machine with M1 chip. This MR will will take care of this problem.

opened by ROZBEH 0

Owner

GitHub

Learning to Simulate Dynamic Environments with GameGAN (CVPR 2020)

Learning to Simulate Dynamic Environments with GameGAN PyTorch code for GameGAN Learning to Simulate Dynamic Environments with GameGAN Seung Wook Kim,

199 Dec 26, 2022

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. Now with tensorflow 1.0 support. Evaluation usa

349 Aug 6, 2022

[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning This is the Tensorflow implementation of ICLR 2021 paper Rank the Episo

48 Nov 21, 2022

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments Paper: arXiv (ICRA 2021) Video : https://youtu.be/CC

68 Jan 3, 2023

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

494 Jan 6, 2023

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI Gym toolkit.

99 Dec 12, 2022

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC

49 Nov 28, 2022

Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators

Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators. It's also a suite of learning algorithms to train agents to operate in these environments (PPO, SAC, evolutionary strategy, and direct trajectory optimization are implemented).

1.5k Jan 2, 2023

PyTorch implementations of deep reinforcement learning algorithms and environments

Deep Reinforcement Learning Algorithms with PyTorch This repository contains PyTorch implementations of deep reinforcement learning algorithms and env

4.7k Jan 4, 2023

Ranking Models in Unlabeled New Environments （iccv21）

Ranking Models in Unlabeled New Environments Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch 1.7.0 + torchivision 0.8.1

14 Dec 17, 2021

Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Cross-Quality Labeled Faces in the Wild (XQLFW) Here, we release the database, evaluation protocol and code for the following paper: Cross Quality LFW

10 Dec 12, 2022

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

300 Dec 28, 2022

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

721 Jan 3, 2023

PyTorch implementation of Memory-based semantic segmentation for off-road unstructured natural environments.

MemSeg: Memory-based semantic segmentation for off-road unstructured natural environments Introduction This repository is a PyTorch implementation of

11 Nov 28, 2022

A user-friendly research and development tool built to standardize RL competency assessment for custom agents and environments.

Built with ❤️ by Sam Showalter Contents Overview Installation Dependencies Usage Scripts Standard Execution Environment Development Environment Benchm

1 Nov 18, 2021

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

24 Mar 2, 2022