QDax is a tool to accelerate Quality-Diveristy (QD) algorithms through hardware accelerators and massive parallelism

Adaptive and Intelligent Robotics Lab

Last update: Dec 30, 2022

Related tags

Algorithms QDax

Overview

QDax: Accelerated Quality-Diversity

QDax is a tool to accelerate Quality-Diveristy (QD) algorithms through hardware accelerators and massive parallelism.

QDax paper: https://arxiv.org/abs/2202.01258

Installation

Dependencies

In particular, QDax relies on the JAX and brax libraries. To install all dependencies, you can run the following command:

pip install -r requirements.txt

Installing QDax

pip install git+https://github.com/adaptive-intelligent-robotics/QDax.git

Examples

There are two ways to run QDax:

Colab Notebooks (has visualization included) - recommended (to also avoid needing to download dependencies and configure environment) Open the notebook notebook in the notebooks directory and run it according the walkthrough instructions.
Locally - A singularity folder is provided to easily install everything in a container. If you use singularity image or install the dependencies locally, you can run a single experiment using for example:

python run_qd.py --env_name walker --grid_shape 30 30 --batch_size 2048 --num-evaluations 1000000

Alternatively, to run experiments that compare the effect of batch sizes, use command below. For example, to run the experiments on the walker environment (which has a 2-dimensional BD) with a grid shape of (30,30) with 5 replications.

python3 run_comparison_batch_sizes.py --env_name walker --grid_shape 30 30 -n 5
CUDA_VISIBLE_DEVICES=0 python3 run_comparison_batch_sizes.py --env_name walker --grid_shape 30 30 -n 5
CUDA_VISIBLE_DEVICES="0,1" python3 run_comparison_batch_sizes.py --env_name walker --grid_shape 30 30 -n 5

Analysis and Plotting Tools

Expname is the name of the directories of the experiments (it will look for directory that start with that string. Results is the directory containing all the results folders.

python3 analysis/plot_metrics.py --exp-name qdax_training --results ./qdax_walker_fixednumevals/ --attribute population_size --save figure.png

where:

--exp-name is the name of the directories of the experiments (it will look for directory that starts with that string.
--results is the directory containing all the results folders.
--attribute: attribute in which we want to compare the results on.

Code Structure (for developers)

Some things to note beforehand is that JAX relies on a functional programming paradigm. We will try as much as possible to maintain this programming style.

The main file used is qdax/training/qd.py. This file contains the main train function which consists of the entire QD loop and supporting functions.

Inputs: The train function takes as input the task, emitter and hyperparameters.
Functions: The main functions used by train are also declared in this file. Working in top_down importance in terms of how the code works. The key function here is the _es_one_epoch function. In terms of QD, this determines the loop performed at each generation: (1) Selection (from archive) and Variation to generate solutions to be evaluated defined by the emitter_fn, (2) Evaluation and (3) Archive Update defined by (eval_and_add_fn). The first part of the train function is the init_phase_fn which initializes the archive using random policies.
Flow: train first calls init_phase_fn and then _es_one_epoch for a defined number of generations or evaluations.

Notes

Key Management

key = jax.random.PRNGKey(seed)
key, key_model, key_env = jax.random.split(key, 3)

key is for training_state.key
key_model is for policy_model.init
key_env is for environment initialisations (although in our deterministic case we do not really use this)

From the flow of the program, we perform an init_phase first. The init_phase function uses the training_state.key and outputs the updated training_state (with a new key) after performing the initialization (initialization of archive by evaluating random policies).

After this, we depend on the training_state.key in es_one_epoch to be managed. In the es_one_epoch(training_state):

key, key_emitter, key_es_eval = jax.random.split(training_state.key, 3)

key_selection passed into selection function
key_petr is passed into mutation function (iso_dd)
key_es_eval is passed into eval_and_add
key is saved as the new training_state.key for the next epoch. And the training_state is returned as an output of this function.

Contributors

QDax is currently developed and maintained by the Adaptive & Intelligent Robotics Lab (AIRL):

Comments

Map Elites questions

I'm using QDax 0.1.0 on Windows with Jupyter with cpu-only jaxlib. I'm looking at and modifying the map elites notebook. With no modifications, each iteration in the main for-loop takes about 7-8 seconds (looking at mapelites-logs.csv). If I use a custom environment that just does basic jnp operations and returns done after one step, the iteration time only comes down to around 4 seconds. Why can't I get it to something much much faster? I feel like something is being re-jitted. It looks conspicuous in a Google Colab too while it's running. That call-stack preview thing gets long in the bottom of the screen.

Can you explain which parameters are supposed to affect the speed of each iteration? How does num_centroids affect computational cost? How does the size of the action space affect computational cost? How should one pick a batch size?

These were my modification to the map elites notebook.


# change the policy layers
policy_hidden_layer_sizes = (4, 4)
# and re-initialize the policy (not shown)

from brax.envs.env import State
from qdax.environments import QDEnv
from typing import List, Tuple
class MyEnv(QDEnv):

    @property
    def state_descriptor_length(self) -> int:
        raise ValueError("foo")

    @property
    def state_descriptor_name(self) -> str:
        raise ValueError("foo")

    @property
    def state_descriptor_limits(self) -> Tuple[List[float], List[float]]:
        raise ValueError("foo")

    @property
    def behavior_descriptor_length(self) -> int:
        return 3

    @property
    def behavior_descriptor_limits(self) -> Tuple[List[float], List[float]]:
        a_min = [-1. for _ in range(self.behavior_descriptor_length)]
        a_max = [1. for _ in range(self.behavior_descriptor_length)]
        return a_min, a_max

    @property
    def name(self) -> str:
        return "MyEnvFoo"

    @property
    def observation_size(self):
        return 10

    @property
    def action_size(self) -> int:
        return 3

    def reset(self, rng: jnp.ndarray) -> State:
        """Resets the environment to an initial state."""

        obs_init =  jnp.ones((10,))
        
        reward, done = jnp.zeros(2)
        metrics: Dict = {}
        info_init = {"state_descriptor": obs_init}
        return State(None, obs_init, reward, done, metrics, info_init)

    def step(self, state: State, actions) -> State:
        """Run one timestep of the environment's dynamics."""
        
        reward = 1e-6
        done = jnp.array(1.0)
        new_obs = state.obs
        return state.replace(obs=new_obs, reward=reward, done=done)

env = MyEnv(config=None)
# don't use the brax environment
# env = environments.create(env_name, episode_length=episode_length)

Inside play_step_fn, set truncations to None.

Redefine bd_extraction_fn:

def bd_extraction_fn(data, mask):
    # print('actions:', data.actions)
    return data.actions[:,0,:]

The iteration time is still 4 seconds. Thanks for your help. I would love to see this run blazing fast.

bug

opened by DBraun 7

Update jax, brax and flax versions (fixes the jax.tree_util warnings)
Related to issue #74

all jax tree-based functions are now imported from the tree_util module

updates the dependencies in requirements.txt and setup.py

Update the version of Brax in the requirements and in the setup.py

This PR also adds the wrapper CompletedEvalWrapper that used to be in Brax, but has been removed in the most recent versions.

In summary, this wrapper used to be present in Brax, and we rely on some elements of it in 4 algorithms (I think those are: SAC, DIAYN, DADS and TD3), and their tests where all failing for that reason.

So what I did is copying the old wrapper of Brax here, and now all the tests pass.

This is a provisional solution to make it work in the same way as before. Maybe we can improve and adapt our own code to the new structure of QDax, but I think that is for another time/PR.
enhancement
opened by Lookatator 5

Adding highest performant solution to archive

Hello, I have been working with QDax and I think I encountered an issue regarding which solutions are stored in the archive based on their evaluation scores. That is, it seems like there is a chance that not the highest performant solution will be stored in the archive. https://github.com/adaptive-intelligent-robotics/QDax/blob/ad3953dc904f6dc19b5b4c7f90fa2f8c134128e4/qdax/qd_utils/grid_archive.py#L75

More precisely, please execute the following code and trace which solution is stored in the archive at the end as generated in the line https://github.com/adaptive-intelligent-robotics/QDax/blob/ad3953dc904f6dc19b5b4c7f90fa2f8c134128e4/qdax/qd_utils/grid_archive.py#L78.

The code:

import jax
from qdax.qd_utils import grid_archive
import jax.numpy as jnp

key = jax.random.PRNGKey(0)
params_size =  5
batch_size = 3
grid_shape = (30,30)
min_bd = 0
max_bd = 1
repertoire = grid_archive.Repertoire.create(jax.random.normal(key,shape=(params_size,)), min=min_bd, max=max_bd, grid_shape=grid_shape)

params = jnp.array([[-0.4518846,  -2.0728214,   0.02437184,  0.56900173, -2.0105903 ],
 [ 0.31103376, -0.29348192, -0.27793083, -1.2343968,   1.6130152 ],
 [ 1.997377,   -0.9525061,  -0.57822144,  0.8413021,  -2.02012   ]])

bds = jnp.array([[-0.16706778, -0.5440059 ],
 [-0.47653008, -1.6869655 ],
 [-0.9096347,  -0.07636569]])
objs = jnp.array([2, 10, 2 ]) 

dead = jnp.zeros(batch_size)

repertoire = repertoire.add_to_archive(repertoire = repertoire,
                                 pop_p = params,
                                 bds = bds,
                                 eval_scores = objs,
                                 dead = dead)

The result I get:

bds= [[-0.16706778 -0.5440059 ]
 [-0.47653008 -1.6869655 ]
 [-0.9096347  -0.07636569]]

pop_p= [[-0.4518846  -2.0728214   0.02437184  0.56900173 -2.0105903 ]
 [ 0.31103376 -0.29348192 -0.27793083 -1.2343968   1.6130152 ]
 [ 1.997377   -0.9525061  -0.57822144  0.8413021  -2.02012   ]]

eval_scores= [ 2 10  2]

bd_insertion= [0 0 0]

repertoire.archive= [[ 1.997377   -0.9525061  -0.57822144  0.8413021  -2.02012   ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 .
 .
 .
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]]

You can change the behavioural descriptors to any other values that would map to the same position and the same issue still exists. That is, this doesn't seem to happen on position 0 of the archive array only.

opened by valentinosPariza 4

Add QDPG emitter + refactor PGAME
This PR is related to #82

It introduces:

the introduction of the QualityPGEmitter to decompose the PGAMEEmitter

refactoring of PGAMEEmitter. Now, PGAMEEmitter = MultiEmitter(QualityPGEmitter, MixingEmitter)

the algorithm QDPG, through the use of the QDPGEmitter, with QDPGEmitter = MultiEmitter(DiversityPGEmitter, QualityPGEmitter, MixingEmitter)

test for QDPG

notebook for QDPG

updated documentation

updated MultiEmitter to give the option to use all data when updating the emitters (important for the PGEmitters as they both need the transitions for all policies)

Potential discussion:

there is a small difference in the way PGAMEEmitter and QDPGEmitter are defined, in particular the config. Reviewers should have a look at both and tell me what they prefer. Also unification is better, they are also reasons to have different style (defining the qdpg config in the same way as pgame would make it very messy; but having pgame config like qdpg config would make it a bit less straitforward and introduce a small change in the api).

isn't it time to get rid of the MixingEmitter? My personal answer is yes but I just felt we should keep it for another PR to keep everything clear (also, I did not wanted to change PGAME too much in a single PR). If everyone agrees, I'll create an issue with this task.

Remaining tasks:

[x] double check PGAME perfs

[x] get some QDPG perfs

[x] wait for MultiEmitter PR to be merged in develop first
opened by felixchalumeau 3
add update policy delay in PGA

Delayed update of the actor policy when updating the models (actor and critic) in PGA

Before update:

After update:

Performance is tested on walker_uni task. Performance seems to improve (albeit only testing on one seed). Coverage relatively unaffected hence improvement mainly in quality, which makes sense given that this is a fix for the actor policy update.

opened by limbryan 3
switched depreciated dependency sklearn -> scikit-learn
switched depreciated dependency sklearn -> scikit-learn

Related issues: #130

The PR replaces the depreciated sklearn package with scikit-learn.

Checks

[x] a clear description of the PR has been added

[ ] sufficient tests have been written

[ ] relevant section added to the documentation

[ ] example notebook added to the repo

[ ] clean docstrings and comments have been written

[ ] if any issue/observation has been discovered, a new issue has been opened

Future improvements

N/A
opened by mplemay 2

Sklearn package depreciated

When installing qdax, sklearn package raises an exception due to the depreciation of sklearn in favor of scikit-learn (see details on pypi).

(.venv) matthewlemay@Matthews-MacBook-Air ~/G/yuyu> python3 -m pip install -U --no-cache-dir qdax
Collecting qdax
  Downloading qdax-0.2.0-py3-none-any.whl (155 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 kB 16.1 MB/s eta 0:00:00
Requirement already satisfied: jax>=0.3.16 in ./.venv/lib/python3.10/site-packages (from qdax) (0.3.25)
Requirement already satisfied: numpy>=1.22.3 in ./.venv/lib/python3.10/site-packages (from qdax) (1.23.5)
Requirement already satisfied: scipy>=1.8.0 in ./.venv/lib/python3.10/site-packages (from qdax) (1.9.3)
Requirement already satisfied: jaxlib>=0.3.15 in ./.venv/lib/python3.10/site-packages (from qdax) (0.3.25)
Collecting gym>=0.23.1
  Downloading gym-0.26.2.tar.gz (721 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 721.7/721.7 kB 15.9 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting scikit-learn>=1.0.2
  Downloading scikit_learn-1.1.3-cp310-cp310-macosx_10_9_x86_64.whl (8.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.7/8.7 MB 21.3 MB/s eta 0:00:00
Collecting brax>=0.0.15
  Downloading brax-0.0.15-py3-none-any.whl (372 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 372.3/372.3 kB 31.0 MB/s eta 0:00:00
Collecting flax<0.6.2,>=0.6
  Downloading flax-0.6.1-py3-none-any.whl (185 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 185.6/185.6 kB 126.9 MB/s eta 0:00:00
Collecting sklearn
  Downloading sklearn-0.0.post1.tar.gz (3.6 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
      rather than 'sklearn' for pip commands.

      Here is how to fix this error in the main use cases:
      - use 'pip install scikit-learn' rather than 'pip install sklearn'
      - replace 'sklearn' by 'scikit-learn' in your pip requirements files
        (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
      - if the 'sklearn' package is used by one of your dependencies,
        it would be great if you take some time to track which package uses
        'sklearn' instead of 'scikit-learn' and report it to their issue tracker
      - as a last resort, set the environment variable
        SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error

      More information is available at
      https://github.com/scikit-learn/sklearn-pypi-package

      If the previous advice does not cover your use case, feel free to report it at
      https://github.com/scikit-learn/sklearn-pypi-package/issues/new
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

opened by mplemay 2

Wrapper for fixed initial state of environments
The wrapper alters the reset function to return the default initial joint positions and zero velocity on the joints. Some issues remain due to the underlying brax envs which do not have uniform api for _get_obs.

tests have been implemented, fails the humanoid, walker and hopper tests at the moment because of the issue mentioned above.
opened by limbryan 2
Add Multi-Emitter
Add a base Multi-Emitter implementation of a batch of Emitters. No choice strategy is used, all sub emitters are called and the proportion are defined in the script when defining the sub emitters. We will consider adding "strategic" layers later, to manage the choice of emitters or the proportion. This is left for a future PR.

Exceptionally, we'll leave tests and documentation for another PR.

TODOs:

[x] Tests

[x] Documentation
opened by Lookatator 2
PGAME Replay Buffer delete newest solutions

Hi :)

The current PGAME Replay Buffer is using jax.lax.dynamic_update_slice to add new transition to the replay buffer. However, this is not acting like a circular buffer, meaning that if a batch contain more transitions than the size remaining in the buffer, it would delete the more recent transitions instead of the oldest ones.
bug

opened by manon-but-yes 2
PGAME add dead transition to Replay Buffer

Hi :)

It seems that the current implementation of PGAME is adding to the Replay-Buffer all the transitions that are collected in the environment, meaning also transitions that occur after the individual is dead and the environment returned done = 1. I only run initial tests but this seems to slightly impact the PGAME algorithm performance.
bug

opened by manon-but-yes 2
A GNN-based Meta-Learning Method for Sparse Portfolio Optimization

Hello,

Let me start by saying that I am a fan of your work here. I have recently open-sourced by GNN-based meta-learning method for optimization. I have applied it to the sparse index-tracking problem from real-world (after an initial benchmarking on Schwefel function), and it seems to outperform Fast CMA-ES significantly both in terms of producing robust solutions on the blind test set and also in terms of time (total duration and iterations) and space complexity. I include the link to my repository here, in case you would consider adding the method or the benchmarking problem to your repository. Note: GNN, which learns how to generate populations of solutions at each iteration, is trained using gradients retrieved from the loss function, as opposed to black-box ones.

Sincerely, K

opened by kayuksel 0
chore: fix typo in archive.py
This PR introduces:

neigbors -> neighbors

Checks

[x] a clear description of the PR has been added

[ ] sufficient tests have been written

[ ] relevant section added to the documentation

[ ] example notebook added to the repo

[ ] clean docstrings and comments have been written

[ ] if any issue/observation has been discovered, a new issue has been opened

Future improvements

[List here potential observations made and/or improvements that could be made in the future. If relevant, open issues for those.]
opened by eltociear 0
Remove the x-y observations from the anttrap and some exploration wrappers.

We can do the automatically now with the updated brax version which takes this into account. Also something to add to the caveats section, that the reset of the trap wrapper (any wrapper that adds new bodies) changes the observations size/dimensions.

opened by limbryan 0
Shouldn't we swap fitnesses and descriptors in repertoire addition functions?

It is incoherent the way fitnesses and descriptors are ordered in repertoire addition compared to the rest of the codebase. This is very minor but we should still consider fixing this in the future.

opened by felixchalumeau 0
Should we add TPU support for docker?

Following up issue #65.

Multi-devices support has been resolved and can be used with TPU; but not with docker at the moment. Should we add TPU support for docker containers?

opened by felixchalumeau 0

Releases(v0.2.1)

v0.2.1(Dec 7, 2022)

Small release to fix a dependency that was preventing the installation of qdax through pypi. The issue was related to scikit-learn installation. See #130
Source code(tar.gz)
Source code(zip)
v0.2.0(Dec 1, 2022)
Moving to v0.2.0 with several new algorithms, fixes of current implementations, a whole new set of tasks, a more general API and some dependencies update to stay in phase with the Jax community :rocket: :fast_forward: :fast_forward:

fix(envs): order of wrappers to ensure update of state descriptor when using fixed init state (#128)

fix(doc): add colab links, missing doc, update version (#125)

feat!(repertoire): optional extra-scores for repertoire addition (#118)

fix(jit): avoid consecutive jits of same method in for loops (#122)

fix(docker): fix run-image docker stage (#121)

feat(algorithms): add MAP-Elites distributed on multiple devices (#117)

fix(test): inverse fitness and desc names in sampling test (#119)

feat(github): add GitHub template for PR (#120)

feat(algorithms): add QDPG emitter + refactor PGAME (#110)

feat(algorithms): add CMA-ME, fix CMA-ES and CMA-MEGA (#86)

fix(mees): add batch size property (#114)

feat(algorithms): Add Multi-Emitter (#90)

fix: reset_based scoring in brax_env default task (#109)

feat(algorithms): add ME-ES to QDax (#81)

fix(examples): brax version in colab examples (#108)

fix(docs): avoid using flax 0.6.2 in setup (#112)

feat(envs): wrapper for fixed initial state of environments (#92)

fix(style): mypy issue in controller training

fix: optimizer state reinitialization for PG variations (#104)

fix: add update policy delay in PG emitter

fix(pointmaze): scale after the clip of actions (#101)

hotfix(images): re-add deleted logos to the repo

docs: add caveats and logo (#99)

feat: Default Scoring Functions for Sphere, Rastrigin, Arm, Brax environments, Hypervolume functions and QD Suite (#73)

chore: Update jax, brax and flax versions (fixes the jax.tree_util warnings) (#76)

doc: Add remark for installing QDax with GPU support in README (#77)

fix the replay buffer overflow issue (#75)

fix: correct irrelevant factor 0.25 in td3 loss (#78)

We also welcome a new contributor: @maxencefaldor :clap:

Well done to all the team for this new release :tada: :woman_technologist: :man_technologist: (@manon-but-yes @Lookatator @limbryan @ranzenTom @Aneoshun @Egiob @valentinmace @maxiallard @felixchalumeau)
Source code(tar.gz)
Source code(zip)
v0.1.0(Jul 12, 2022)
Finally moving to version 0.1.0 of QDax with several algorithms additions, new tools, enhanced documentation and several fixes :rocket:

Update notebooks to run on colab and update chex version (#61)

Pin brax version + remove autoreload (#60)”

Enhance the current documentation + gather baselines in separate folder (#57)

Add example notebook for NSGA2 and SPEA2 (#58)

Add API documentation, use README as home page (#56)

Add NSGA2 and SPEA2 to QDax (#33)

Add reset_based_scoring_function for stochastic environments (#31)

Issue with batch size 1 (#52)

Add colab badges to all example notebooks (#53)

Add codecov to QDax for test coverage reports (#39)

Update singularity pipeline and fix pre-commits (#45)

Add plot_multidimensional_map_elites_grid function (#47)

Extend functionality of compute_euclidean_centroids (#40)

Repertoire addition for 1d bd (map elites and mome) (#41)

Static argnames in _sample_in_masked_pareto_front (#49)

Update ci trigger rules (#48)

Use rngkey for centroids generation

Run workflow on pull requests

Use RNGKey for sampling CVT centroids and for KMeans

Fix mome emitter state update

Add MOME to QDax (#27)

Fix steps incrementation in pointmaze (#36)

Upgrade requirements for numpy and jax (#34)

Add CMA MEGA (#25) - (API change!)

Add SMERL (DIAYN+DADS) (#22)

Add DADS (#21)

Add OMG-MEGA (#24)

Add DIAYN (#20) and refactor SAC

Add SAC algorithm (#16)

Add TD3 algorithm (#14)

Fix issue in behavior descriptor evaluation (#13)

Source code(tar.gz)
Source code(zip)

Owner

Adaptive and Intelligent Robotics Lab

GitHub

Cormen-Lib - An academic tool for data structures and algorithms courses

The Cormen-lib module is an insular data structures and algorithms library based on the Thomas H. Cormen's Introduction to Algorithms Third Edition. This library was made specifically for administering and grading assignments related to data structure and algorithms in computer science.