MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

EnliteAI GmbH

Last update: Dec 24, 2022

Related tags

Deep Learning python documentation data-science machine-learning automation framework reinforcement-learning monitoring deep-learning simulation optimization decision-making distributed applied-machine-learning

Overview

Applied Reinforcement Learning with Python

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ranging from simulation engineering up to agent development, training and deployment.

This is a preliminary, non-stable release of Maze. It is not yet complete and not all of our interfaces have settled yet. Hence, there might be some breaking changes on our way towards the first stable release.

Spotlight Features

Below we list a few selected Maze features.

Design and visualize your policy and value networks with the Perception Module. It is based on PyTorch and provides a large variety of neural network building blocks and model styles. Quickly compose powerful representation learners from building blocks such as: dense, convolution, graph convolution and attention, recurrent architectures, action- and observation masking, self-attention etc.
Create the conditions for efficient RL training without writing boiler plate code, e.g. by supporting best practices like pre-processing and normalizing your observations.
Maze supports advanced environment structures reflecting the requirements of real-world industrial decision problems such as multi-step and multi-agent scenarios. You can of course work with existing Gym-compatible environments.
Use the provided Maze trainers (A2C, PPO, Impala, SAC, Evolution Strategies), which are supporting dictionary action and observation spaces as well as multi-step (auto-regressive policies) training. Or stick to your favorite tools and trainers by combining Maze with other RL frameworks.
Out of the box support for advanced training workflows such as imitation learning from teacher policies and policy fine-tuning.
Keep even complex application and experiment configuration manageable with the Hydra Config System.

Get Started

Make sure PyTorch is installed and then get the latest released version of Maze as follows
```
pip install -U maze-rl

# optionally install RLLib if you want to use it in combination with Maze
pip install ray[rllib] tensorflow  
```
Read more about other options like the installation of the latest development version.

⚡ We encourage you to start with Python 3.7, as many popular environments like Atari or Box2D can not easily be installed in newer Python environments. Maze itself supports newer Python versions, but for Python 3.9 you might have to install additional binary dependencies manually
To see Maze in action check out a first example.
For a more applied introduction visit the step by step tutorial.

Installation

First Example

Step by Step Tutorial

Documentation

Learn more about Maze

The documentation is the starting point to learn more about the underlying concepts, but most importantly also provides code snippets and minimum working examples to get you started quickly.

The Workflow section guides you through typical tasks in a RL project
Policy and Value Networks introduces you to the Perception Module, how to customize action spaces and the underlying action probability distributions and two styles of policy and value networks construction:
- Template models are composed directly from an environment's observation and action space, allowing you to train with suitable agent networks on a new environment within minutes.
- Custom models gives you the full flexibility of application specific models, either with the provided Maze building blocks or directly with PyTorch.
Learn more about core concepts and structures such as the Maze environment hierarchy, the Maze event system providing a convenient way to collect statistics and KPIs, enable flexible reward formulation and supporting offline analysis.
Structured Environments and Action Masking introduces you to a general concept, which can greatly improve the performance of the trained agents in practical RL problems.

License

Maze is freely available for research and non-commercial use. A commercial license is available, if interested please contact us on our company website or write us an email.

We believe in Open Source principles and aim at transitioning Maze to a commercial Open Source project, releasing larger parts of the framework under a permissive license in the near future.

Comments

Configuration problems in the step-by-step tutorial
I've just been trying out maze and tried out the step-by-step tutorial.

In Step 5 (5. Training the MazeEnv) the instructions are incomplete or wrong.

I was able to get it running in the end, but it took (us) quite some time. I'm not sure if this is a bug in maze or hydra, of if just some newer version of either library changes the behavior a little bit. But you should update the documentation such that it works out of the box for new users of the library.

The setup (under Ubuntu 2020.04):

>> mkdir maze5 && cd maze5 >> pyenv local 3.8.8 >> python -m venv .venv >> source .venv/bin/activate >> pip install maze-rl torch >> pip list Package Version ----------------------- ----------- hydra-core 1.1.0 hydra-nevergrad-sweeper 1.1.5 maze-rl 0.1.7 torch 1.9.0 ...

Then just copy-pasted the files from the https://github.com/enlite-ai/maze-examples/tree/main/tutorial_maze_env/part03_maze_env repo and adjusted the _target paths in the config yamls (e.g. from _target_: tutorial_maze_env.part03_maze_env.env.maze_env.maze_env_factory to _target_: env.maze_env.maze_env_factory).

Problem 1:

When you run the suggested training command, Hydra will just complain that it can't find the configuration files.

>> maze-run -cn conf_train env=tutorial_cutting_2d_basic wrappers=tutorial_cutting_2d_basic \ model=tutorial_cutting_2d_basic algorithm=ppo In 'conf_train': Could not find 'model/tutorial_cutting_2d_basic' Available options in 'model': flatten_concat flatten_concat_shared_embedding pixel_obs pixel_obs_rnn rllib vector_obs vector_obs_rnn Config search path: provider=hydra, path=pkg://hydra.conf provider=main, path=pkg://maze.conf provider=schema, path=structured://

Fix:

You can just define the config directory for hydra with maze-run -cd conf -cn conf_train .... Then Hydra will find the 3 config files and load them correctly.

Problem 2:

After loading the config files, hydra tries to load the modules defined in the _target fields. And that fails immediatly with:

... File "***/maze5-uWAZh5bh/lib/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 104, in _resolve_target return _locate(target) File "***/maze5-uWAZh5bh/lib/python3.8/site-packages/hydra/_internal/utils.py", line 563, in _locate raise ImportError(f"Error loading module '{path}'") from e ImportError: Error loading module 'env.maze_env.maze_env_factory'

Fix:

For some reason Hydra doesn't know the path to the directory from where we call maze-run. And therefore it doesn't find the env directory containing the maze_env file.

This is fixable by just setting the environment variable: export PYTHONPATH="$PYTHONPATH:$PWD/".
bug documentation
opened by jakobkogler 2
Hello from Hydra :)

Thanks for using Hydra! I see that you are using Hydra 1.1 already which is great. One thing that is really recent is the ability to configure the config searchpath from the primary config. You can learn about it here.

This can probably eliminate the need of your users to even know what a ConfigSearchpathPlugin is.

Feel free to jump into the Hydra chat if you have any questions.

opened by omry 2
Version 0.1.7
Adds Soft Actor-Critic (SAC) Trainer (supporting Dictionary Observations and Actions)

Simplifies the reward aggregation interface (now also supports multi-agent training)

Extends PPO and A2C to multi-agent capable actor-critic trainers (individual agents vs. centralized critic)

Adds option for custom rollout evaluators

Adds option for shared weights in actor-critic settings

Adds experiment and multi-run support for RunContext Python API
opened by enliteai 0
Version 0.1.6
Changes

made Maze compatible to Rllib 1.4

updated to the recently released hydra 1.1.0

Simpified API (RunContext): Experiment and evaluation support

Fixed support of the nevergrad sweeper: made the LocalLauncher hydra plugin part of the wheel

Replaced the (policy id, actor id) tuple with an ActorID class

Other

various documentation improvements

added ready-to-go Docker containers

contribution guidelines, pull request templates etc. on GitHub
opened by md-enlite 0
Version 0.1.5
Features:

Adds documentation for run_context

Changes of simulated environment interfaces step_without_observation -> fast_step

Adds seeding to environments, models and trainers

Initial commit of the Maze Python API

Adds an ExportGifWrapper

Adds network architecture visualizations to Tensorboard Images

adds incremental min/max stats

adds categorical (support-based) value networks

added value transformations
opened by md-enlite 0
Towards Version 0.1.5
Adds seeding to environments, models and trainers

Initial commit of the Maze Python API

Adds an ExportGifWrapper

Adds network architecture visualizations to Tensorboard Images
opened by md-enlite 0
Release Version 0.1.4
improved docs

switch to RLlib version 1.3.0.

full structured env support

policy interface now selects policy based on actor_id

added testing dependencies to main package
opened by enliteai 0
Dev
adds PointNetFeatureBlock to perception module

adds Tensorboard hyper paramter visualization for hydra multiruns

merges parallel and sequential dataset into a single InMemoryDataset
opened by md-enlite 0
Version 0.1.3
Improvements:

Enable event collection from within the Wrapper stack

Aligned StepSkipWrapper with the event system

MonitoringWrapper: Logging of observations, actions and rewards throughout the wrapper stack, useful for diagnosis

Make _recursive_ in Hydra config files compatible with Maze object instantiation
opened by enliteai 0
Version 0.1.2
Features:

Imitation Learning:

Added Evaluation Rollouts

Unified dataset structures (InMemoryDataset)

GlobalPoolingBlock: now supports sum and max pooling

ObservationNormalizationWrapper: Adds observation and observation distribution visualization to Tensorboard logging.

Distribution: Introduced VectorEnv, refactored the single and multi process parallelization wrappers.
opened by enliteai 0
Dev
Features:

hyper parameter optimization via grid search and Nevergrad

plain python training example

local hydra job launcher

extend attention/transformer perception blocks

Fixes:

cumulative stats logging
opened by md-enlite 0

Releases(v0.2.0)

v0.2.0(Nov 21, 2022)
New graph neural network building blocks (message passing based on torch-scatter in addition to existing graph convolutions)

Support for action recording, replay from pre-computed action records and feature collection.

Improved wrapper hierarchy semantics: Previously values were assigned to the outermost wrapper. Now values are assigned to existing attributes by traversing the wrapper hierarchy.

Removal of deprecated modules (APIContext and Maze models for RLlib)

Reflecting changes in upstream dependencies (Gym version pinned to <0.23)

Source code(tar.gz)
Source code(zip)
v0.1.8(Dec 13, 2021)
New Features

Agent Deployment Workflow

Soft Actor Critic from Demonstrations (SACfD)

Locally Distributed ES Runner

SpacesRecordingWrapper: Records and dumps processed trajectories to pickle files

Fixes event logging for environment resets and policy events

Source code(tar.gz)
Source code(zip)
submission_22-08-25-14-06.1.zip(252.75 MB)
v0.1.7(Jun 24, 2021)
Adds Soft Actor-Critic (SAC) Trainer (supporting Dictionary Observations and Actions)

Simplifies the reward aggregation interface (now also supports multi-agent training)

Extends PPO and A2C to multi-agent capable actor-critic trainers (individual agents vs. centralized critic)

Adds option for custom rollout evaluators

Adds option for shared weights in actor-critic settings

Adds experiment and multi-run support for RunContext Python API

Compatibility with PyTorch 1.9

Source code(tar.gz)
Source code(zip)
v0.1.6(Jun 14, 2021)
Changes

made Maze compatible to Rllib 1.4

updated to the recently released hydra 1.1.0

Simplified API (RunContext): Experiment and evaluation support

Fixed support of the nevergrad sweeper: made the LocalLauncher hydra plugin part of the wheel

Replaced the (policy id, actor id) tuple with an ActorID class

Other

various documentation improvements

added ready-to-go Docker containers

contribution guidelines, pull request templates etc. on GitHub

Source code(tar.gz)
Source code(zip)
v0.1.5(May 20, 2021)
Features:

adds RunContext (Maze Python API)

adds seeding to environments, models and trainers

changes of simulated environment interfaces step_without_observation -> fast_step

Improvements:

adds an ExportGifWrapper

adds network architecture visualizations to Tensorboard Images

adds incremental min/max stats

adds categorical (support-based) value networks

adds value transformations

Source code(tar.gz)
Source code(zip)
v0.1.4(Apr 29, 2021)
switch to RLlib version 1.3.0.

full structured env support

policy interface now selects policy based on actor_id

interfaces support collaborative multi-agent actor critic

improved docs

added testing dependencies to main package

Source code(tar.gz)
Source code(zip)
v0.1.3(Apr 1, 2021)
Improvements:

Enable event collection from within the Wrapper stack

Aligned StepSkipWrapper with the event system

MonitoringWrapper: Logging of observations, actions and rewards throughout the wrapper stack, useful for diagnosis

Make _recursive_ in Hydra config files compatible with Maze object instantiation

Source code(tar.gz)
Source code(zip)
v0.1.2(Mar 25, 2021)
Features:

Imitation Learning:

Added Evaluation Rollouts

Unified dataset structures (InMemoryDataset)

GlobalPoolingBlock: now supports sum and max pooling

ObservationNormalizationWrapper: Adds observation and observation distribution visualization to Tensorboard logging.

Distribution: Introduced VectorEnv, refactored the single and multi process parallelization wrappers.

Source code(tar.gz)
Source code(zip)
v0.1.1(Mar 18, 2021)
Features:

hyper parameter optimization via grid search and Nevergrad

plain python training example

local hydra job launcher

extend attention/transformer perception blocks

adds MazeEnvMonitoringWrapper as a default to wrapper stacks

Fixes:

cumulative stats logging

Source code(tar.gz)
Source code(zip)
v0.1.0(Mar 11, 2021)
Documentation updates:

Integrating existing Gym environments

Factory documentation

Experiments workflow, ...

Updated to Hydra 1.1.0:

Using Hydra.instantiate instead of custom registry implementation

Added Rollout evaluator
Source code(tar.gz)
Source code(zip)