A library of multi-agent reinforcement learning components and systems

Overview

Mava: a research framework for distributed multi-agent reinforcement learning

PyPI Python Version PyPI version pytest license

Table of Contents

  1. Overview
  2. Getting Started
  3. Supported Environments
  4. System implementations
  5. Usage
  6. Installation
  7. Debugging
  8. Roadmap
  9. Contributing
  10. Troubleshooting and FAQ

Mava is a library for building multi-agent reinforcement learning (MARL) systems. Mava provides useful components, abstractions, utilities and tools for MARL and allows for simple scaling for multi-process system training and execution while providing a high level of flexibility and composability.

πŸ‘·β€β™€οΈ NOTICE: Our release of Mava is foremost to benefit the wider community and make it easier for researchers to work on MARL. However, we consider this release a Beta version of Mava. As with many frameworks, Mava is (and will probably always remain) a work in progress and there is much more the team aims to provide and improve in future releases. From incorporating the latest research and innovations to making the framework more stable, robust and well tested. Furthermore, we are committed and will do our best to keep everything working and have the experience of using Mava be as pleasant as possible. During Beta development breaking changes may occur as well as significant design changes (if we feel it could greatly improve the useability of the framework) but these will be clearly communicated before being incorporated into the codebase. It is also inevitable that there might be bugs we are not aware of and that things might break from time to time. We will do our best to fix these bugs and address any issues as quickly as possible. ⭐

Overview

Systems and the Executor-Trainer Paradigm

At the core of the Mava framework is the concept of a system. A system refers to a full multi-agent reinforcement learning algorithm consisting of the following specific components: an Executor, a Trainer and a Dataset.

The Executor is the part of the system that interacts with the environment, takes actions for each agent and observes the next state as a collection of observations, one for each agent in the system. Essentially, executors are the multi-agent version of the Actor class in Acme and are themselves constructed through feeding to the executor a dictionary of policy networks. The Trainer is responsible for sampling data from the Dataset originally collected from the executor and updating the parameters for every agent in the system. Trainers are therefore the multi-agent version of the Learner class in Acme. The Dataset stores all of the information collected by the executors in the form of a collection of dictionaries for the actions, observations and rewards with keys corresponding to the individual agent ids. The basic system design is shown on the left in the above figure. Several examples of system implementations can be viewed here.

Distributed System Training

Mava shares much of the design philosophy of Acme for the same reason: to allow a high level of composability for novel research (i.e. building new systems) as well as making it possible to scale systems in a simple way, using the same underlying multi-agent RL system code. Mava uses Launchpad for creating distributed programs. In Mava, the system executor (which is responsible for data collection) is distributed across multiple processes each with a copy of the environment. Each process collects and stores data which the Trainer uses to update the parameters of all the actor networks used within each executor. This approach to distributed system training is illustrated on the right in the figure above. βœ‹ NOTE: In the near future, Mava aims to support additional training setups, e.g. distributed training using multiple trainers to support Bayesian optimisation or population based training (PBT).

Getting Started

We have a Quickstart notebook that can be used to quickly create and train your first Multi-Agent System. For more information on how to use Mava, please view our usage section.

Supported Environments

A given multi-agent system interacts with its environment via an EnvironmentLoop. This loop takes as input a system instance and a multi-agent environment instance which implements the DeepMind Environment API. Mava currently supports multi-agent environment loops and environment wrappers for the following environments and environment suites:

MAD4PG on PettingZoo's
Multi-Walker environment.
VDN on the SMAC 3m map.

System Implementations

Mava includes several system implementations. Below we list these together with an indication of the maturity of the system using the following keys: 🟩 -- Tested and working well, 🟨 -- Running and training on simple environments, but not extensively tested and πŸŸ₯ -- Implemented but untested and yet to show clear signs of stable training.

  • 🟩 - Multi-Agent Deep Q-Networks (MADQN).
  • 🟩 - Multi-Agent Deep Deterministic Policy Gradient (MADDPG).
  • 🟩 - Multi-Agent Distributed Distributional Deep Deterministic Policy Gradient (MAD4PG).
  • 🟨 - Differentiable Inter-Agent Learning (DIAL).
  • 🟨 - Multi-Agent Proximal Policy Optimisation (MAPPO).
  • 🟨 - Value Decomposition Networks (VDN).
  • πŸŸ₯ - Monotonic value function factorisation (QMIX).
Name Recurrent Continuous Discrete Centralised training Communication Multi Processing
MADQN βœ”οΈ ❌ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
DIAL βœ”οΈ ❌ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
MADDPG βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ ❌ βœ”οΈ
MAD4PG βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ ❌ βœ”οΈ
MAPPO ❌ βœ”οΈ βœ”οΈ βœ”οΈ ❌ βœ”οΈ
VDN ❌ ❌ βœ”οΈ βœ”οΈ ❌ βœ”οΈ
QMIX ❌ ❌ βœ”οΈ βœ”οΈ ❌ βœ”οΈ

As we develop Mava further, we aim to have all systems well tested on a wide variety of environments.

Usage

To get a sense of how Mava systems are used we provide the following simplified example of launching a distributed MADQN system.

# Mava imports
from mava.systems.tf import madqn
from mava.components.tf.architectures import DecentralisedPolicyActor
from . import helpers

# Launchpad imports
import launchpad

# Distributed program
program = madqn.MADQN(
    environment_factory=helpers.environment_factory,
    network_factory=helpers.network_factory,
    architecture=DecentralisedPolicyActor,
    num_executors=2,
).build()

# Launch
launchpad.launch(
    program,
    launchpad.LaunchType.LOCAL_MULTI_PROCESSING,
)

The first two arguments to the program are environment and network factory functions. These helper functions are responsible for creating the networks for the system, initialising their parameters on the different compute nodes and providing a copy of the environment for each executor. The next argument num_executors sets the number of executor processes to be run. After building the program we feed it to Launchpad's launch function and specify the launch type to perform local multi-processing, i.e. running the distributed program on a single machine. Scaling up or down is simply a matter of adjusting the number of executor processes.

For a deeper dive, take a look at the detailed working code examples found in our examples subdirectory which show how to instantiate a few MARL systems and environments.

Components

Mava provides several components to support the design of MARL systems such as different system architectures and modules. You can change the architecture to support a different form of information sharing between agents, or add a module to enhance system capabilities. Some examples of common architectures are given below.

In terms of components, you can for example update the above system code in MADQN to use a communication module by wrapping the architecture fed to the system as shown below.

from mava.components.tf.modules import communication

...

# Wrap architecture in communication module
communication.BroadcastedCommunication(
    architecture=architecture,
    shared=True,
    channel_size=1,
    channel_noise=0,
)

All modules in Mava aim to work in this way.

Installation

We have tested mava on Python 3.6, 3.7 and 3.8.

Docker (Recommended)

  1. Build the docker image using the following make command:

    make build
  2. Run an example:

    make run EXAMPLE=dir/to/example/example.py

    For example, make run EXAMPLE=examples/petting_zoo/sisl/multiwalker/feedforward/decentralised/run_mad4pg.py. Alternatively, run bash inside a docker container with mava installed, make bash, and from there examples can be run as follows: python dir/to/example/example.py.

    To run an example with tensorboard viewing enabled, you can run

    make run-tensorboard EXAMPLE=dir/to/example/example.py

    and navigate to http://127.0.0.1:6006/.

  3. Install multi-agent Starcraft 2 environment [Optional]: To install the environment, please run the provided bash script, which is a slightly modified version of the script found here.

    ./install_sc2.sh

    Or optionally install through docker (each build downloads and installs StarCraftII ~3.8G ):

    make build
    make build_sc2
  4. Install 2D RoboCup environment [Optional]: To install the environment, please run the robocup docker build command after running the Mava docker build command.

    make build
    make build_robocup

Python virtual environment

  1. If not using docker, we strongly recommend using a Python virtual environment to manage your dependencies in order to avoid version conflicts. Please note that since Launchpad only supports Linux based OSes, using a python virtual environment will only work in these cases:

    python3 -m venv mava
    source mava/bin/activate
    pip install --upgrade pip setuptools
  2. To install the core libraries, including Reverb - our storage dataset :

    pip install id-mava
    pip install id-mava[reverb]

    Or for nightly builds:

    pip install id-mava-nightly
    pip install id-mava-nightly[reverb]
  3. To install dependencies for tensorflow agents:

    pip install id-mava[tf]
  4. For distributed agent support:

    pip install id-mava[launchpad]
  5. To install example environments, such as PettingZoo:

    pip install id-mava[envs]
  6. NB: For Flatland, OpenSpiel and SMAC environments, installations have to be done separately. Flatland can be installed using:

    pip install id-mava[flatland]

    and for OpenSpiel, after ensuring that the right cmake and clang versions are installed as specified here:

    pip install id-mava[open_spiel]

    For StarCraft II installation, this must be installed separately according to your operating system. To install the StarCraft II ML environment and associated packages, please follow the instructions on PySC2 to install the StarCraft II game files. Please ensure you have the required game maps (for both PySC2 and SMAC) extracted in the StarCraft II maps directory. Once this is done you can install the packages for the single agent case (PySC2) and the multi-agent case (SMAC).

    pip install pysc2
    pip install git+https://github.com/oxwhirl/smac.git
  7. For the 2D RoboCup environment, a local install has only been tested using the Ubuntu 18.04 operating system. The installation can be performed by running the RoboCup bash script while inside the Mava python virtual environment.

    ./install_robocup.sh

We also have a list of optional installs for extra functionality such as the use of Atari environments, environment wrappers, gpu support and agent episode recording.

Debugging

To test and debug new system implementations, we use a simplified version of the spread environment from the MPE suite. Debugging in MARL can be very difficult and time consuming, therefore it is important to use a small environment for debugging that is simple and fast but at the same time still able to clearly show whether a system is able to learn. An illustration of the debugging environment is shown on the right. Agents start at random locations and are assigned specific landmarks which they attempt to reach in as few steps as possible. Rewards are given to each agent independently as a function of their distance to the landmark. The reward is normalised to be between 0 and 1, where 1 is given when the agent is directly on top of the landmark. The further an agent is away from its landmark the more the reward value converges to 0. Collisions between agents result in a reward of -1 received by the colliding agents. To test both discrete and continuous control systems we feature two versions of the environment. In the discrete version the action space for each agent consists of the following five actions: left, right, up, down, stand-still. In the continuous case, the action space consists of real values bounded between -1 and 1 for the acceleration of the agent in the x and y direction. Several examples of running systems on the debugging environment can be found here. Below we show the results from some of our systems trained on the debugging environment.

Roadmap

We have big ambitions for Mava! πŸš€ But there is still much work that needs to be done. We have a clear roadmap and wish list for expanding our system implementations and associated modules, improving testing and robustness and providing support for across-machine training. Please visit them using the links below and feel free to add your own suggestions!

In the slightly more longer term, the Mava team plans to release benchmarking results for several different systems and environments and contribute a MARL specific behavioural environment suite (similar to the bsuite for single-agent RL) specifically engineered to study aspects of MARL such as cooperation and coordination.

Contributing

Please read our contributing docs for details on how to submit pull requests, our Contributor License Agreement and community guidelines.

Troubleshooting and FAQs

Please read our troubleshooting and FAQs guide.

Citing Mava

If you use Mava in your work, please cite the accompanying technical report:

@article{pretorius2021mava,
    title={Mava: A Research Framework for Distributed Multi-Agent Reinforcement Learning},
    author={Arnu Pretorius and Kale-ab Tessera and Andries P. Smit and Kevin Eloff
    and Claude Formanek and St John Grimbly and Siphelele Danisa and Lawrence Francis
    and Jonathan Shock and Herman Kamper and Willie Brink and Herman Engelbrecht
    and Alexandre Laterre and Karim Beguir},
    year={2021},
    journal={arXiv preprint arXiv:2107.01460},
    url={https://arxiv.org/pdf/2107.01460.pdf},
}
Comments
  • Evaluating model after training is done?

    Evaluating model after training is done?

    Is there an easy way in Mava to load a trained model from the checkpoints, run it again and evaluate it's performance?

    I haven't found any example on how to do this and can't find and easy way to do it just by looking at the code.

    • Is this already implemented in Mava?
    • If not, could you please point out a way for me to implement this?

    To make it more clear, the reason I need this is because I'm working with a model topology that allows a variable number of agents as input. This means I can train the model using a 3-agent environment, but after the training is done I can use this trained model and run it on the same environment with more or less agents and I want to evaluate the performance with different number of agents.

    question 
    opened by mlanas 17
  • Training affected in development branches

    Training affected in development branches

    Problem

    develop and feature/mava-scaling branches seem to be taking longer to train the debugging environment example run_maddpg.py than the 0.1.0 release. The difference between 0.1.0 and develop is not that big, but the feature/mava-scaling one does seem to affect the training considerably.

    | 0.1.0 runs | develop runs | feature/mava-scaling runs | |:-------------:|:-------------------:|:-------------------------------------------:| | 0 1 0_runs | develop_runs | scaling_runs |

    Execution

    The tests were executed using docker. Each branch was cloned to a different directory and the 3 docker images were built.

    For each branch, the test was executed 3 times with the command:

    make run-tensorboard
    

    Note

    After cloning the 0.1.0 tag, the simple_spread.py file of the debugging environment was updated to incorporate the changes added in #288 so all the tests are executed in the same environment.

    bug 
    opened by mlanas 11
  • Feature/jax upgrade networks upgrade acme

    Feature/jax upgrade networks upgrade acme

    What?

    • Change mlp to layernormmlp in ppo to be consistent with our tf systems.
    • Upgrade acme, reverb, tf and launchpad (we will have to benchmark this).

    Why?

    How?

    Extra

    benchmark in progress size/XS 
    opened by KaleabTessera 9
  • Quickstart notebbok: Run Multi-Agent DDPG System.

    Quickstart notebbok: Run Multi-Agent DDPG System.

    I was playing around with the quickstart notebook but having this error on Run Multi-Agent DDPG System (I tried locally and on colab):

    UnparsedFlagAccessError: Trying to access flag --lp_termination_notice_secs before flags were parsed.

    bug 
    opened by jbakams 9
  • PettingZoo simple_spread example doesnt learn

    PettingZoo simple_spread example doesnt learn

    Problem

    I'm running the PettingZoo simple_spread Mava example (run_maddpg.py) from the develop branch and the MeanEpisodeReturn does not improve.

    image

    Is this the expected behaviour? Or should I maybe let it train longer?

    Execution

    make run-tensorboard EXAMPLE=examples/petting_zoo/mpe/simple_spread/feedforward/decentralised/run_maddpg.py
    
    question 
    opened by mlanas 8
  • Questions about multiwalker

    Questions about multiwalker

    Hey, a quick question- how many timesteps did you train multiwalker for with MAD4PG a few months ago when you were able to learn it so effectively that the environment broke and you created an issue with us?

    question 
    opened by jkterry1 7
  • Feature/Population Based Training

    Feature/Population Based Training

    What?

    Add the first example of population based training in Mava. This example uses the recurrent MAD4PG algorithm to train a population of 5 networks, using 5 trainers and 5 executors, on the debugging environment. The hyperparameters that are getting tuned are the discount factor, target update rate and the target update period. This PR will remain in draft form for now as it still needs to be tested in a more complicated environment for longer time periods.

    Why?

    Population based training allows for the joint optimisation of hyperparameters and network parameters in one training setting.

    How?

    Various hooks have been added inside the MADDPG system. A PBT wrapper has also been added. The PBT wrapper can now wrap an MADDG and MAD4PG system and overwrite the appropriate hooks to add PBT to the system.

    Extra

    enhancement 
    opened by DriesSmit 7
  • Feature/Multiple trainers for MA-DDPG

    Feature/Multiple trainers for MA-DDPG

    What?

    Implements a scaled-up version of MADDPG where multiple trainers can now be used with multiple executors. A centralised variable server is also implemented that absorbs the responsibilities of the counter node, trainer checkpointing and trainer variable source. The trainers and executors now read and write to the centralised variable source directly. A multiple trainer example is included where 3 trainers and 2 executors are used to train 3 non-weight sharing agents on the debugging environment.

    Why?

    Multiple trainers allow for the parallelisation of the trainer's tasks. Just like is already done with executors. This also opens the door to hyperparameter tuning directly using Mava in future updates.

    How?

    Added a new Scaled MA-DDPG system that allows for the use of multiple trainers.

    Extra

    This PR uses changes proposed in updated-network-keys. Therefore that PR should be merged first. After that point, this PR can be moved out of the draft status.

    enhancement 
    opened by DriesSmit 7
  • Feature/starcraft wrapper

    Feature/starcraft wrapper

    What?

    Implement StarCraft II wrapper #113. Add installation instructions to README #188 .

    Why?

    SCII is an important test-bed for RL/MARL agents. Specifically, SMAC is used for testing mixing agents etc.

    How?

    Implement SC2 wrapper in the style of the pettingzoo/debugging env wrappers. Pull some methods from the RLlib wrapper provided by SMAC.

    Extra

    This is untested for various reasons. Basically would like some experienced wrapper eyes on the file/progress πŸ˜„ πŸ‘οΈ.

    enhancement 
    opened by sgrimbly 7
  • [BUG] Remove nested tf.function

    [BUG] Remove nested tf.function

    Describe the bug Nested tf.function decorators are causes TF to constantly retrace which is could cause significant performance and memory issues.

    Additional context I think this bug creeped in when we refactored our code to separate forward and backward passes.

    Possible Solution Remove tf.function decorator from the backward pass.

    This bug is also related to #77 and #346

    bug 
    opened by arnupretorius 6
  • feat: Checkpointer Component

    feat: Checkpointer Component

    What?

    A Checkpointer Component for JAX systems.

    Why?

    Save variables to file and restore pretrained weights

    How?

    • Created a Checkpointer Component for JAX systems that uses ACME JAX checkpointer
    • Added a checkpointer unit test
    • Moved Optimisers to an optmiser component
    • Initialised opt_states in the trainer component

    Extra

    • Close Create a Checkpointer Component for JAX systems
    • Updated parameter server tests as the checkpointer is no longer integrated into the param server
    • Modified the test systems to save the experiment data in a temp folder
    • Fixed a small bug to ensure that parameter client get and set keys are always disjoint sets
    • Renamed all optimZer to optimiSer in code :smile:
    • General refactor by removing unused imports and repeated code
    • Refactored tests to no longer say "separate_networks"
    • Added a constants.py file as per discussion with @DriesSmit
    • To follow in another PR:
      • Checkpointing JAX random states (to be discussed): https://github.com/instadeepai/Mava/issues/746
      • Checkpointing best parameters: https://github.com/instadeepai/Mava/issues/744
      • Documenting checkpointer: https://github.com/instadeepai/Mava/issues/749
    • New issues opened as a result of this investigation
      • https://github.com/instadeepai/Mava/issues/747
      • https://github.com/instadeepai/Mava/issues/748
    size/XXL 
    opened by AsadJeewa 5
  • [BUG] Quickstart example fails in Colab

    [BUG] Quickstart example fails in Colab

    Describe the bug

    Problems when trying to run the quickstart.ipynb notebook

    To Reproduce

    Steps to reproduce the behavior:

    1. Visit https://colab.research.google.com/github/instadeepai/Mava/blob/develop/examples/quickstart.ipynb
    2. Run the "Install required packages" cell
    3. Hit error in installing box-2d (note: this install fails quietly, because the output is %%captured)
    4. As a result, id-mava isn't installed and the later cells won't run

    Expected behavior

    The install should work without hiccup for any user trying the Colab notebook.

    Context (Environment)

    • OS: Google Colab – Release 2022/12/6

    Additional context

    n/a

    Possible Solution

    Common problem with box-2d: https://stackoverflow.com/questions/54252800/python-cant-install-box2d-swig-exe-failed-with-error-code-1, need to manually install swig first.

    bug 
    opened by callumtilbury 0
  • feat: support for TPU - sets environment variables correctly to use T…

    feat: support for TPU - sets environment variables correctly to use T…

    TPU support

    What?

    Changed Environment variables in lp_utils.to_device function to set up that only "nodes_on_gpu" can see the TPU and other nodes can only see CPU. This allows the trainer to run on a TPU. Additionally, a new config parameter simply called "use_tpu" was added and threaded through the launcher.

    Why?

    This is due to launchpad processes crashing if more than one process tries to use a TPU.

    How?

    As stated in "What", the environment variables decide which platform JAX uses.

    Extra

    There is a slight problem when wanting to use a TPU. The base python environment (the one calling the training script) needs to be set to only see a CPU otherwise it will crash for the same reason as stated above. This is simple to do through export JAX_PLATFORMS="cpu". One thing that has not been considered in this PR is if someone wants to put certain nodes on the TPU and other nodes on the GPU but that is quite fine grained and can be easily added later down the line. It gets quite complicated as TPUs can only have a single model running on it so I'm also not sure how this will work for non-parameter sharing situations i.e heterogenous agents.

    size/M 
    opened by EdanToledo 4
  • [FEATURE] Add TPU support

    [FEATURE] Add TPU support

    Please describe the purpose of the feature. Is it related to a problem?

    Hello, the title is pretty self explanatory. I'd just like to add TPU support for mava. Due to launchpad, TPUs wont work with the code as is. I'm not sure if you have tried it yet but the fix is pretty simple.

    Describe the solution you'd like

    Essentially, all that needs to be changed is the environment variables that are set in the lp_utils.to_device function. I've already written the code - its like 4-6 lines but I am unable to make a PR.

    Describe alternatives you've considered

    Crying and not running on a TPU.

    How do we know when implementation of this feature is complete?

    Checklist:

    • [X] Code runs on TPU.

    Additional context

    I am currently using mava on TPU so this is all that is needed to be done.

    enhancement 
    opened by EdanToledo 2
  • feat: make best checkpoint support norm params

    feat: make best checkpoint support norm params

    What?

    Make the bestcheckpoint component support the case of normalization

    Why?

    Currently the best checkpointer and the absolute metric features are working fine in the default case, however, they both don't support the case of the normalization of the params such as normalize observation and normalize the target value.

    How?

    Edit the stored network in the best_checkpoint params

    Extra

    Close #859

    To test this feature:

    1. Run examples/smac/feedforward/decentralised/calculate_absolute_metric.py to check the logged json file
    2. Run examples/smac/feedforward/decentralised/best_checkpointed_net.py to check that it restore the best network
    enhancement size/M 
    opened by OmaymaMahjoub 1
  • Recurrent IPPO critic

    Recurrent IPPO critic

    What?

    Added support for recurrent critics in IPPO. The system is working, learning and leads to performance increases. The trainer works by using the initial RNN hidden state for training instead of the network hidden state used during interacting with the environment.

    Why?

    Improve the improve IPPO system performance.

    Extra

    For now the batch size has to be passed in to mava/systems/ippo/networks.py for initialising the critic hidden states. The system has the most optimal performance when value clipping, orthogonal network initialisation and all normalisation except observation normalisation are turned off. Additionally an MSE loss should be used for the critic network for optimal performance.

    enhancement size/XXL 
    opened by RuanJohn 0
  • [MAINTAIN] Making the best checkpoint and the absolute metric support norm params

    [MAINTAIN] Making the best checkpoint and the absolute metric support norm params

    Please describe what needs to be maintained?

    Currently the best checkpointer and the absolute metric features are working fine in the default case, however, they both don't support the case of the normalization of the params such as normalize observation and normalize the target value.

    Describe the outcome you'd like

    Create the option of storing norm_params in the best checkpointer component

    maintenance 
    opened by OmaymaMahjoub 0
Releases(0.1.3)
  • 0.1.3(Jun 15, 2022)

    Highlights

    This is the last tensorflow system release. After this, tensorflow systems will be deprecated in favour of Jax systems and our new callback redesign (https://github.com/instadeepai/Mava/pull/457).

    Systems

    • Updates to acme, reverb and tensorflow.
    • Working centralised and state based architectures.
    • Recurrent and Multiple Trainer PPO.

    Environments

    What's Changed

    • Bugfix/ Release aren't triggering pypi push job. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/466
    • Feature / Release 0.1.2 v2 by @KaleabTessera in https://github.com/instadeepai/Mava/pull/467
    • fix: Update black version. by @DriesSmit in https://github.com/instadeepai/Mava/pull/470
    • Bugfix/ Update PZ Version and new jax dockerfiles by @KaleabTessera in https://github.com/instadeepai/Mava/pull/480
    • Feature/recurrent and multiple trainer MAPPO by @DriesSmit in https://github.com/instadeepai/Mava/pull/326
    • Feat/maddpg obs optim by @AsadJeewa in https://github.com/instadeepai/Mava/pull/459
    • feat: Add fixed sampler capability + bugfixes by @DriesSmit in https://github.com/instadeepai/Mava/pull/475
    • Feature/fix sampler madqn by @EdanToledo in https://github.com/instadeepai/Mava/pull/477
    • chore: Up the patch version of mava - 0.1.3. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/485
    • Bugfix/fix old tf architectures by @KaleabTessera in https://github.com/instadeepai/Mava/pull/552
    • Release 0.1.3 by @KaleabTessera in https://github.com/instadeepai/Mava/pull/486

    Full Changelog: https://github.com/instadeepai/Mava/compare/0.1.2...0.1.3

    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Mar 28, 2022)

    Highlights

    Systems

    • Fixed observation network bug in mappo + changed implementation to use two optims.
    • Fixes in maddpg/mad4pg loss calculation.
    • Began on jax system implementations.

    Environments

    What's Changed

    • Fix/add loss mask to ppo by @EdanToledo in https://github.com/instadeepai/Mava/pull/441
    • Mainetenance: Fix tf examples issues by @AsadJeewa in https://github.com/instadeepai/Mava/pull/444
    • fix: shared weights with agent type by @AsadJeewa in https://github.com/instadeepai/Mava/pull/428
    • Fix broken readme links and neaten up formatting by @AsadJeewa in https://github.com/instadeepai/Mava/pull/446
    • Feature/jax abstract builder class by @arnupretorius in https://github.com/instadeepai/Mava/pull/433
    • docs: updated docs to better represent available options by @sash-a in https://github.com/instadeepai/Mava/pull/448
    • Feature/jax general system class by @arnupretorius in https://github.com/instadeepai/Mava/pull/425
    • Bugfix/Mypy Inconsistency Issue by @KaleabTessera in https://github.com/instadeepai/Mava/pull/458
    • fix/remove flatland wrapper debug print statement by @mmorris44 in https://github.com/instadeepai/Mava/pull/456
    • Feature/MAPPO Obs Networks Fix + Multiple Optims by @KaleabTessera in https://github.com/instadeepai/Mava/pull/454
    • Feature/new issue template for investigations by @KaleabTessera in https://github.com/instadeepai/Mava/pull/461
    • Bugfix/MADD(4)PG by @DriesSmit in https://github.com/instadeepai/Mava/pull/460
    • Feat/Upped pypi version. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/464
    • Feature / Release 0.1.2 by @KaleabTessera in https://github.com/instadeepai/Mava/pull/465

    Full Changelog: https://github.com/instadeepai/Mava/compare/0.1.1...0.1.2

    Source code(tar.gz)
    Source code(zip)
  • 0.1.1(Feb 25, 2022)

    Highlights

    Systems

    • Stable versions of all systems - noteably stable mappo, vdn and qmix.
    • Multiple trainer implementations for maddpg and mad4pg.
    • Removed the dial system.

    Environments/ Environment Wrappers

    What's Changed

    • Feature/Enforce docstring code coverage. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/271
    • Chore/Resized gifs in readme. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/272
    • Feature/Improve Mava agent networks by @DriesSmit in https://github.com/instadeepai/Mava/pull/258
    • Feature/upgrade acme version and use new adders by @KaleabTessera in https://github.com/instadeepai/Mava/pull/274
    • Chore/Updated makefile and readme for Windows. by @Nashlen in https://github.com/instadeepai/Mava/pull/273
    • Fix/supersuit version by @KaleabTessera in https://github.com/instadeepai/Mava/pull/277
    • Chore/ Update quickstart by @KaleabTessera in https://github.com/instadeepai/Mava/pull/278
    • Feature/New acme adders and tests. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/276
    • feature: working version of importance sampling on feedforward madqn. by @jcformanek in https://github.com/instadeepai/Mava/pull/275
    • fix/ Smac Load by @KaleabTessera in https://github.com/instadeepai/Mava/pull/283
    • update Dockerfile for SMAC installation by @mnguyen0226 in https://github.com/instadeepai/Mava/pull/286
    • Bugfix: Simple_spread observation code. by @DriesSmit in https://github.com/instadeepai/Mava/pull/288
    • Bugfix/launchpad flag issue by @KaleabTessera in https://github.com/instadeepai/Mava/pull/291
    • Feature/mava reproducibility and PZ wrapper fix by @KaleabTessera in https://github.com/instadeepai/Mava/pull/296
    • fix: Autorom manual download. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/300
    • Feature: Add Readme for setting up a new environment by @DriesSmit in https://github.com/instadeepai/Mava/pull/299
    • Chore/re add autorom by @KaleabTessera in https://github.com/instadeepai/Mava/pull/302
    • Add checkpoint save interval variable. by @DriesSmit in https://github.com/instadeepai/Mava/pull/301
    • Feature/Upgraded tf and reverb versions. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/303
    • Chore/flatland gif by @arnupretorius in https://github.com/instadeepai/Mava/pull/304
    • Small readme updates by @arnupretorius in https://github.com/instadeepai/Mava/pull/305
    • Feature: added rendering to flatland wrapper. by @jcformanek in https://github.com/instadeepai/Mava/pull/307
    • chore/Updates for new acme version. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/308
    • Fix per agent loggers by @DriesSmit in https://github.com/instadeepai/Mava/pull/313
    • Removed deprecated shared_weights parameter by @mmorris44 in https://github.com/instadeepai/Mava/pull/319
    • docs: update README with correct link by @AsadJeewa in https://github.com/instadeepai/Mava/pull/320
    • small fix for README.md by @arnupretorius in https://github.com/instadeepai/Mava/pull/322
    • Feature/Multiple trainers for MA-DDPG by @DriesSmit in https://github.com/instadeepai/Mava/pull/253
    • Fix Flatland package error in Docker build by @DriesSmit in https://github.com/instadeepai/Mava/pull/328
    • Feature/melting pot by @ldfrancis in https://github.com/instadeepai/Mava/pull/324
    • Fix RoboCup environment wrapper by @DriesSmit in https://github.com/instadeepai/Mava/pull/334
    • Feature/eval intervals by @KaleabTessera in https://github.com/instadeepai/Mava/pull/323
    • Feature/ Smac wrapper Update, MADQN/QMIX/VDN upgrades and Dockerfile improvements by @KaleabTessera in https://github.com/instadeepai/Mava/pull/310
    • Feature/add robocup gif by @DriesSmit in https://github.com/instadeepai/Mava/pull/336
    • Feature/auto-push-docker-images and version upgrades by @KaleabTessera in https://github.com/instadeepai/Mava/pull/342
    • Added a brief explanation of Logging metrics to README by @RuanJohn in https://github.com/instadeepai/Mava/pull/341
    • Updated pip installation instructions in README by @RuanJohn in https://github.com/instadeepai/Mava/pull/343
    • Bugfix/dockerfile no module found by @KaleabTessera in https://github.com/instadeepai/Mava/pull/344
    • feat(git): Added feature and bug templates. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/350
    • Doc/meltingpot gif by @ldfrancis in https://github.com/instadeepai/Mava/pull/351
    • Replace types ParallelAdder with ReverbParallelAdder by @AsadJeewa in https://github.com/instadeepai/Mava/pull/356
    • Update README to link to pypi package by @AsadJeewa in https://github.com/instadeepai/Mava/pull/360
    • Feature/auto docs by @KaleabTessera in https://github.com/instadeepai/Mava/pull/354
    • Feature/maintainace issue template by @arnupretorius in https://github.com/instadeepai/Mava/pull/368
    • Fix/broken launchpad link by @sash-a in https://github.com/instadeepai/Mava/pull/370
    • feat: filter docker image push based on label. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/375
    • Maintenance/update readme by @arnupretorius in https://github.com/instadeepai/Mava/pull/378
    • chore: expand code owner list for better code review by @arnupretorius in https://github.com/instadeepai/Mava/pull/390
    • Feature/internal feature issue template by @arnupretorius in https://github.com/instadeepai/Mava/pull/379
    • Feature/internal bug issue template by @arnupretorius in https://github.com/instadeepai/Mava/pull/381
    • feat: benchmarking issue template by @arnupretorius in https://github.com/instadeepai/Mava/pull/385
    • Bugfix: Fixed conventional commit pre-commit hook not running. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/395
    • Fix/checklist for issue templates by @arnupretorius in https://github.com/instadeepai/Mava/pull/388
    • feat: internal issue tempalte for tests by @arnupretorius in https://github.com/instadeepai/Mava/pull/399
    • chore: add optional benchmark questions to feature by @arnupretorius in https://github.com/instadeepai/Mava/pull/401
    • Fix/madqn by @jcformanek in https://github.com/instadeepai/Mava/pull/362
    • Fix/architecture typo fix. by @RuanJohn in https://github.com/instadeepai/Mava/pull/410
    • Fix/Smac Wrapper Relies on Flatland Installation by @KaleabTessera in https://github.com/instadeepai/Mava/pull/413
    • refactor: move examples into tf folder and update examples links by @arnupretorius in https://github.com/instadeepai/Mava/pull/416
    • fix: readd quickstart notebook by @arnupretorius in https://github.com/instadeepai/Mava/pull/417
    • fix: Fix broken tests due to new gym version. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/421
    • Maintenance: Remove redundant value_network code by @AsadJeewa in https://github.com/instadeepai/Mava/pull/423
    • fix: small bug in the pettingzoo wrapper related to legal action masking by @jcformanek in https://github.com/instadeepai/Mava/pull/432
    • Fix/Flatland Docker Container by @KaleabTessera in https://github.com/instadeepai/Mava/pull/437
    • Feature/jax abstract system class by @arnupretorius in https://github.com/instadeepai/Mava/pull/405
    • Feature/ppo multiple train steps by @EdanToledo in https://github.com/instadeepai/Mava/pull/353
    • Fix/ Fix docs build. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/435
    • Feature/jax mava custom config class by @arnupretorius in https://github.com/instadeepai/Mava/pull/414
    • Feat/Release new mava version. by @KaleabTessera in https://github.com/instadeepai/Mava/pull/438
    • Merge: Merge Dev into Main for Release by @KaleabTessera in https://github.com/instadeepai/Mava/pull/439

    New Contributors

    • @Nashlen made their first contribution in https://github.com/instadeepai/Mava/pull/273
    • @mnguyen0226 made their first contribution in https://github.com/instadeepai/Mava/pull/286
    • @mmorris44 made their first contribution in https://github.com/instadeepai/Mava/pull/319
    • @AsadJeewa made their first contribution in https://github.com/instadeepai/Mava/pull/320
    • @RuanJohn made their first contribution in https://github.com/instadeepai/Mava/pull/341
    • @sash-a made their first contribution in https://github.com/instadeepai/Mava/pull/370
    • @EdanToledo made their first contribution in https://github.com/instadeepai/Mava/pull/353

    Full Changelog: https://github.com/instadeepai/Mava/compare/0.1.0...0.1.1

    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Jul 6, 2021)

    Highlights

    Mava Core

    • Components

      • Architectures
        • Added Centralised, Decentralised, Networked and State Based Architectures.
      • Modules
        • Added Broadcast Communication, Epsilon Decay Scheduling, Additive and Monotonic Mixing and Fingerprint Stabilization.
      • Networks
        • Added Additive and Monotic Mixing Networks, Hypernetworks, Communication Networks, Epsilon Greedy and DiscreteValued head.
    • Environment Loops

      • Added Parallel and Sequential Environment Loops.
    • Adders

      • Added Parallel versions of Transition, Sequential and Episode Adders.

    Systems

    • Added feedforward training for maddpg, mad4pg, madqn, mappo, vdn and qmix.
    • Added recurrent training for madqn, dial, maddpg and mad4pg.
    • Added continuous network heads for maddpg, mad4pg and mappo.
    • Added decentralised architecture training for maddpg, mad4pg, madqn, mappo, dial, vdn and qmix.
    • Added centralised architecture training for maddpg, mad4pg and mappo.
    • Added state based architecture training for maddpg and mad4pg.
    • Added networked architecture training for maddpg.

    Environments/ Environment Wrappers

    • Added PettingZoo, SMAC, RoboCup, OpenSpiel, Flatland, Debug Simple Spread, Debug Switch environment and Debug Two-Step game.

    Examples

    • Added quickstart notebook.
    • Added basic examples for sample systems and environments.

    Minor Changes and Fixes

    Source code(tar.gz)
    Source code(zip)
  • 0.0.9(Jun 9, 2021)

Owner
InstaDeep Ltd
InstaDeep offers a host of Enterprise AI products, ranging from GPU-accelerated insights to self-learning decision making systems.
InstaDeep Ltd
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

万鲲鹏 7 Sep 20, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

null 405 Jan 6, 2023
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 8, 2023
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 6, 2023
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

Future Power Networks 83 Jan 6, 2023
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022
Pytorch modules for paralel models with same architecture. Ideal for multi agent-based systems

WideLinears Pytorch parallel Neural Networks A package of pytorch modules for fast paralellization of separate deep neural networks. Ideal for agent-b

null 1 Dec 17, 2021
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 2, 2023
Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Deep Reinforcement Learning for Smart Cities Documentation RLlib: https://docs.ray.io/en/master/rllib.html Mesa: https://mesa.readthedocs.io/en/stable

null 1 May 15, 2022
A multi-entity Transformer for multi-agent spatiotemporal modeling.

baller2vec This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotempor

Michael A. Alcorn 56 Nov 15, 2022
Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

RuanJingqing 8 Sep 30, 2022
Deep Reinforcement Learning based Trading Agent for Bitcoin

Deep Trading Agent Deep Reinforcement Learning based Trading Agent for Bitcoin using DeepSense Network for Q function approximation. For complete deta

Kartikay Garg 669 Dec 29, 2022
Minecraft agent to farm resources using reinforcement learning

BarnyardBot CS 175 group project using Malmo download BarnyardBot.py into the python examples directory and run 'python BarnyardBot.py' in the console

null 0 Jul 26, 2022
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Introduction PyTorch3D provides efficient, reusable components for 3D Computer Vision research with PyTorch. Key features include: Data structure for

Facebook Research 6.8k Jan 1, 2023
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 7, 2022