WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

Salesforce

Last update: Jan 6, 2023

Related tags

Deep Learning warp-drive

Overview

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Using the extreme parallelization capability of GPUs, WarpDrive enables orders-of-magnitude faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU, and runs simulations across multiple agents and multiple environment replicas in parallel. Together, this allows the user to run thousands of concurrent multi-agent simulations and train on extremely large batches of experience, achieving over 100x throughput over CPU-based counterparts.

Our current release includes several multi-agent environments based on the game of "Tag", where taggers are trying to run after and tag the runners. More environments will be added soon!

Below, we show multi-agent RL policies trained for different tagger:runner speed ratios using WarpDrive. These environments can run at millions of steps per second, and train in just a few hours, all on a single GPU!

WarpDrive also provides tools to build and train multi-agent RL systems quickly with just a few lines of code. Here is a short example to train tagger and runner agents:

# Create a wrapped environment object via the EnvWrapper
# Ensure that use_cuda is set to True (in order to run on the GPU)
env_wrapper = EnvWrapper(
    TagContinuous(**run_config["env"]),
    num_envs=run_config["trainer"]["num_envs"], 
    use_cuda=True
)

# Agents can share policy models: this dictionary maps policy model names to agent ids.
policy_tag_to_agent_id_map = {
    "tagger": list(env_wrapper.env.taggers),
    "runner": list(env_wrapper.env.runners),
}

# Create the trainer object
trainer = Trainer(
    env_wrapper=env_wrapper,
    config=run_config,
    policy_tag_to_agent_id_map=policy_tag_to_agent_id_map,
)

# Create and push data placeholders to the device
create_and_push_data_placeholders(
    env_wrapper, 
    policy_tag_to_agent_id_map, 
    training_batch_size_per_env=trainer.training_batch_size_per_env
)

# Perform training!
trainer.train()

White Paper and Citing WarpDrive

You can find more details in our white paper: https://arxiv.org/abs/2108.13976.

If you're using WarpDrive in your research or applications, please cite using this BibTeX:

@misc{lan2021warpdrive,
      title={WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU}, 
      author={Tian Lan and Sunil Srinivasa and Stephan Zheng},
      year={2021},
      eprint={2108.13976},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Tutorials and Quick Start

Familiarize yourself with WarpDrive by running these tutorials on Colab!

A simple end-to-end RL training example: Explains how to get started with multi-agent RL training with just a few lines of code.
WarpDrive basics: Explains the basics of Python APIs in the host managing the CUDA data and kernel functions in the GPU.
WarpDrive sampler: Explains Python APIs controlling the GPU action sampler.
WarpDrive resetter and logger: Explains Python APIs controlling the GPU environment resetter and rollout history logger.
Create custom environments: Explains how to create your own custom RL environment in CUDA C, and integrate it with WarpDrive.
Training with WarpDrive: Explains how to train your environment on the GPU.

Note: You may also run these tutorials locally, but you will need a GPU machine with nvcc compiler installed and a compatible Nvidia GPU driver. You will also need Jupyter. See https://jupyter.readthedocs.io/en/latest/install.html for installation instructions

You can find full reference documentation here.

Installation Instructions

To get started, you'll need to have Python 3.7+ and the nvcc compiler installed with a compatible Nvidia GPU CUDA driver.

CUDA (which includes nvcc) can be installed by following Nvidia's instructions here: https://developer.nvidia.com/cuda-downloads.

Docker Image

You can refer to the example Dockerfile to configure your system. In particular, we suggest you visit Nvidia Docker Hub to download the CUDA and cuDNN images compatible with your system. You should be able to use the command line utility to monitor the NVIDIA GPU devices in your system:

nvidia-smi

and see something like this

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P0    32W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

In this snapshot, you can see we are using a Tesla V100 GPU and CUDA version 11.0.

Installing using Pip

You can install WarpDrive using the Python package manager:

pip install rl_warp_drive

Installing from Source

Clone this repository to your machine:

git clone www.github.com/salesforce/warp-drive

Optional, but recommended for first tries: Create a new conda environment (named "warp_drive" below) and activate it:
```
conda create --name warp_drive python=3.7 --yes
conda activate warp_drive
```
Install as an editable Python package:
```
cd warp_drive
pip install -e .
```

Testing your Installation

To test your installation, try running from the root directory:

conda activate warp_drive
cd warp_drive/cuda_includes
make compile-test

Running make compile-test will compile the core service source code into a CUDA binary and place it in a bin folder, and additionally, run some unit tests.

Learn More

For more information, please check out our blog, white paper, and code documentation.

If you're interested in extending this framework, or have questions, join the AI Economist Slack channel using this invite link.

Comments

problem with running environment

Hi,

Need some help to make this run. The issue is happening in Jupyter and in PyCharm, 'make bin file failed..' exception is raised over and over again. The exception is raised in simple-end-to-end example and also while running both tests build in the library. Fresh Conda environment.

Thanks for any suggestions

appdirs 1.4.4 pypi_0 pypi atomicwrites 1.4.0 py_0 attrs 21.4.0 pyhd3eb1b0_0 backcall 0.2.0 pypi_0 pypi blas 1.0 mkl brotli 1.0.9 h2bbff1b_7 brotli-bin 1.0.9 h2bbff1b_7 ca-certificates 2022.07.19 haa95532_0 certifi 2022.6.15 py37haa95532_0 charset-normalizer 2.1.1 pypi_0 pypi cloudpickle 2.1.0 pypi_0 pypi colorama 0.4.5 py37haa95532_0 cycler 0.11.0 pyhd3eb1b0_0 debugpy 1.6.3 pypi_0 pypi decorator 5.1.1 pypi_0 pypi entrypoints 0.4 pypi_0 pypi fonttools 4.25.0 pyhd3eb1b0_0 freetype 2.10.4 hd328e21_0 glib 2.69.1 h5dc1a3c_1 gst-plugins-base 1.18.5 h9e645db_0 gstreamer 1.18.5 hd78058f_0 gym 0.25.2 pypi_0 pypi gym-notices 0.0.8 pypi_0 pypi icu 58.2 ha925a31_3 idna 3.3 pypi_0 pypi importlib-metadata 4.11.3 py37haa95532_0 importlib_metadata 4.11.3 hd3eb1b0_0 iniconfig 1.1.1 pyhd3eb1b0_0 intel-openmp 2021.4.0 haa95532_3556 ipykernel 6.15.1 pypi_0 pypi ipython 7.34.0 pypi_0 pypi jedi 0.18.1 pypi_0 pypi jpeg 9e h2bbff1b_0 jupyter-client 7.3.4 pypi_0 pypi jupyter-core 4.11.1 pypi_0 pypi kiwisolver 1.4.2 py37hd77b12b_0 lerc 3.0 hd77b12b_0 libbrotlicommon 1.0.9 h2bbff1b_7 libbrotlidec 1.0.9 h2bbff1b_7 libbrotlienc 1.0.9 h2bbff1b_7 libclang 12.0.0 default_h627e005_2 libdeflate 1.8 h2bbff1b_5 libffi 3.4.2 hd77b12b_4 libiconv 1.16 h2bbff1b_2 libogg 1.3.5 h2bbff1b_1 libpng 1.6.37 h2a8f88b_0 libtiff 4.4.0 h8a3f274_0 libvorbis 1.3.7 he774522_0 libwebp 1.2.2 h2bbff1b_0 libxml2 2.9.14 h0ad7f3c_0 libxslt 1.1.35 h2bbff1b_0 lz4-c 1.9.3 h2bbff1b_1 mako 1.2.1 pypi_0 pypi markupsafe 2.1.1 pypi_0 pypi matplotlib 3.5.1 py37haa95532_1 matplotlib-base 3.5.1 py37hd77b12b_1 matplotlib-inline 0.1.6 pypi_0 pypi mkl 2021.4.0 haa95532_640 mkl-service 2.4.0 py37h2bbff1b_0 mkl_fft 1.3.1 py37h277e83a_0 mkl_random 1.2.2 py37hf11a4ad_0 munkres 1.1.4 py_0 nest-asyncio 1.5.5 pypi_0 pypi numpy 1.21.5 py37h7a0a035_3 numpy-base 1.21.5 py37hca35cd5_3 openssl 1.1.1q h2bbff1b_0 packaging 21.3 pyhd3eb1b0_0 parso 0.8.3 pypi_0 pypi pcre 8.45 hd77b12b_0 pickleshare 0.7.5 pypi_0 pypi pillow 9.2.0 py37hdc2b20a_1 pip 22.1.2 py37haa95532_0 platformdirs 2.5.2 pypi_0 pypi pluggy 1.0.0 py37haa95532_1 ply 3.11 py37_0 prompt-toolkit 3.0.30 pypi_0 pypi psutil 5.9.1 pypi_0 pypi py 1.11.0 pyhd3eb1b0_0 pycuda 2021.1 pypi_0 pypi pygments 2.13.0 pypi_0 pypi pyparsing 3.0.4 pyhd3eb1b0_0 pyqt 5.15.7 py37hd77b12b_0 pyqt5-sip 12.11.0 py37hd77b12b_0 pytest 7.1.2 py37haa95532_0 python 3.7.13 h6244533_0 python-dateutil 2.8.2 pyhd3eb1b0_0 pytools 2022.1.12 pypi_0 pypi pywin32 304 pypi_0 pypi pyyaml 6.0 py37h2bbff1b_1 pyzmq 23.2.1 pypi_0 pypi qt-main 5.15.2 he8e5bd7_7 qt-webengine 5.15.9 hb9a9bb5_4 qtwebkit 5.212 h3ad3cdb_4 requests 2.28.1 pypi_0 pypi rl-warp-drive 1.6.7 pypi_0 pypi setuptools 61.2.0 py37haa95532_0 sip 6.6.2 py37hd77b12b_0 six 1.16.0 pyhd3eb1b0_1 sqlite 3.39.2 h2bbff1b_0 tk 8.6.12 h2bbff1b_0 toml 0.10.2 pyhd3eb1b0_0 tomli 2.0.1 py37haa95532_0 torch 1.10.2 pypi_0 pypi torchaudio 0.12.1+cu116 pypi_0 pypi torchtext 0.11.2 pypi_0 pypi torchvision 0.11.3 pypi_0 pypi tornado 6.1 py37h2bbff1b_0 tqdm 4.64.0 pypi_0 pypi traitlets 5.3.0 pypi_0 pypi typing_extensions 4.3.0 py37haa95532_0 urllib3 1.26.11 pypi_0 pypi vc 14.2 h21ff451_1 vs2015_runtime 14.27.29016 h5e58377_2 wcwidth 0.2.5 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0 wincertstore 0.2 py37haa95532_2 xz 5.2.5 h8cc25b3_1 yaml 0.2.5 he774522_0 zipp 3.8.0 py37haa95532_0 zlib 1.2.12 h8cc25b3_2 zstd 1.5.2 h19a0ad4_0

opened by DanielWit 9
PyTorch Lightning Trainer

Hey there,

First of all, thanks for this library, it looks great.

While reading throught the codebase, it seems there is lot of boilerplate. Might want to consider PyTorch Lightning : https://github.com/PyTorchLightning.

Best, T.C

opened by tchaton 7
Addition of Other Reinforcement Learning Algorithms (i.e., Q-Learning)

Dear WarpDrive Team,

May I find out if it is possible to implement other reinforcement learning algorithms into WarpDrive (i.e., Q-Learning)?

If not, may I ask whether PPO and A2C are considered one of the better algorithms out there in the field? I am not that well informed of the algorithms and their individual advantages, but from what I have garnered from online searches:

It can be observed that PPO provides a better convergence and performance rate than other techniques but is sensitive to changes. DQN alone is unstable and gives poor convergence, hence requires several add-ons.

Reference: https://medium.datadriveninvestor.com/which-reinforcement-learning-rl-algorithm-to-use-where-when-and-in-what-scenario-e3e7617fb0b1

opened by rllyryan 4
[Tutorial 5] 'Trainer' object has no attribute 'cuda_sample_controller'
Dear WarpDrive team,

In tutorial 5, for this particular line of code:

anim = generate_tag_env_rollout_animation(trainer)

I have encountered an error:

AttributeError: 'Trainer' object has no attribute 'cuda_sample_controller'

I have also tried to access this particular attribute in a separate cell but to no avail.

Like the previous issue regarding Tutorial 2, is there a solution for this?

Thank you and I look forward to your help!

Ryan :)
opened by rllyryan 4
'EnvWrapper' object has no attribute 'use_cuda'

Hi, thanks for you great contribution with warp_drive. I am just following the tutorial of tutorial-7-training_with_warp_drive_and_pytorch_lightning. I have no error until the cell which we use EnvWrapper:

I just installed the exact packages in the tutorial but still get the error of AttributeError: 'EnvWrapper' object has no attribute 'use_cuda'. Thanks for your support.

opened by Mshz2 3
a question for cuda env

There is reset in the python environment, but in cuda they do not, suppose I want each environment agent to be able to set a different initial position, how can I write it in cuda?

opened by jqu314159 3
Some question about the environment
I think the idea of environment scheduling is very novel. Multi-environment and multi-agent are scheduled on GPU, which improves GPU utilization ratio. I have some questions about the tag-continuous:

Does continuous represent continuous action space? As I saw that actually the action space of tag-continuous is discrete.

Is there any example about ppo algorithm with tag or gridworld?
opened by ghost 3
An issue with installation of warp-drive: Failed building wheel for pycuda
Hello all!

I follow the instruction on this repository to install warp-drive on my laptop:

Processor Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz 1.99 GHz Installed RAM 16.0 GB (15.9 GB usable) System type 64-bit operating system, x64-based processor Edition Windows 10 Pro

However, it gives the following error for "pycuda":

C:\Users\Aslan\AppData\Local\Temp\pip-install-6zudjbjm\pycuda_26af0be0537a4731b787cf7208c68c7e\src\cpp\cuda.hpp(14): fatal error C1083: Cannot open include file: 'cuda.h': No such file or directory

C:\Users\Aslan\AppData\Local\Temp\pip-build-env-22hj7b1u\overlay\Lib\site-packages\setuptools\command\build_py.py:153: SetuptoolsDeprecationWarning: Installing 'pycuda.cuda' as data is deprecated, please list it in `packages`. !! ############################ # Package would be ignored # ############################ Python recognizes 'pycuda.cuda' as an importable package, but it is not listed in the `packages` configuration of setuptools. 'pycuda.cuda' has been automatically added to the distribution only because it may contain data files, but this behavior is likely to change in future versions of setuptools (and therefore is considered deprecated). Please make sure that 'pycuda.cuda' is included as a package by using the `packages` configuration field or the proper discovery methods (for example by using `find_namespace_packages(...)`/`find_namespace:` instead of `find_packages(...)`/`find:`). You can read more about "package discovery" and "data files" on setuptools documentation page. !! check.warn(importable) error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.26.28801\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for pycuda Failed to build pycuda ERROR: Could not build wheels for pycuda, which is required to install pyproject.toml-based projects WARNING: Ignoring invalid distribution -ffi (c:\users\aslan\anaconda3\envs\ai-economist\lib\site-packages) WARNING: Ignoring invalid distribution -ffi (c:\users\aslan\anaconda3\envs\ai-economist\lib\site-packages) WARNING: Ignoring invalid distribution -ffi (c:\users\aslan\anaconda3\envs\ai-economist\lib\site-packages)

As you see the problem is with pycuda.

Here is my installed packages:

Name Version Build Channel

absl-py 1.1.0 pypi_0 pypi aiosignal 1.2.0 pypi_0 pypi alabaster 0.7.12 py37_0 anaconda astroid 2.9.0 py37haa95532_0 anaconda astunparse 1.6.3 pypi_0 pypi attrs 21.2.0 pypi_0 pypi babel 2.9.1 pyhd3eb1b0_0 anaconda backcall 0.2.0 pyhd3eb1b0_0 anaconda beautifulsoup4 4.11.1 py37haa95532_0 anaconda blas 1.0 mkl bleach 4.1.0 pyhd3eb1b0_0 anaconda brotlipy 0.7.0 py37h2bbff1b_1003 anaconda ca-certificates 2022.4.26 haa95532_0 anaconda cachetools 5.2.0 pypi_0 pypi certifi 2022.6.15 py37haa95532_0 anaconda cffi 1.15.0 pypi_0 pypi chardet 4.0.0 py37haa95532_1003 anaconda charset-normalizer 2.0.4 pyhd3eb1b0_0 anaconda cloudpickle 2.0.0 pyhd3eb1b0_0 anaconda colorama 0.4.4 pyhd3eb1b0_0 anaconda cryptography 36.0.0 py37h21b164f_0 anaconda cudatoolkit 11.6.0 hc0ea762_10 conda-forge debugpy 1.5.1 py37hd77b12b_0 anaconda decorator 5.0.9 pypi_0 pypi defusedxml 0.7.1 pyhd3eb1b0_0 anaconda distlib 0.3.4 pypi_0 pypi docutils 0.17.1 py37haa95532_1 anaconda entrypoints 0.4 py37haa95532_0 anaconda filelock 3.7.1 pypi_0 pypi flatbuffers 1.12 pypi_0 pypi freetype 2.10.4 hd328e21_0 frozenlist 1.3.0 pypi_0 pypi gast 0.4.0 pypi_0 pypi google-auth 2.8.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.43.0 pypi_0 pypi gym 0.21.0 pypi_0 pypi icu 58.2 vc14hc45fdbb_0 [vc14] anaconda idna 3.3 pyhd3eb1b0_0 anaconda imageio 2.19.3 pypi_0 pypi imagesize 1.3.0 pyhd3eb1b0_0 anaconda importlib-metadata 4.11.3 py37haa95532_0 anaconda importlib_metadata 4.11.3 hd3eb1b0_0 anaconda importlib_resources 5.2.0 pyhd3eb1b0_1 anaconda intel-openmp 2021.4.0 haa95532_3556 ipykernel 6.15.0 pypi_0 pypi ipython 7.34.0 pypi_0 pypi ipython_genutils 0.2.0 pyhd3eb1b0_1 anaconda isort 5.9.3 pyhd3eb1b0_0 anaconda jedi 0.18.0 pypi_0 pypi jinja2 3.0.3 pyhd3eb1b0_0 anaconda jpeg 9b vc14h4d7706e_1 [vc14] anaconda jsonschema 4.4.0 py37haa95532_0 anaconda jupyter-client 7.3.4 pypi_0 pypi jupyter_client 7.2.2 py37haa95532_0 jupyter_core 4.10.0 py37haa95532_0 jupyterlab-pygments 0.2.2 pypi_0 pypi jupyterlab_pygments 0.1.2 py_0 anaconda keras 2.9.0 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi keyring 23.4.0 py37haa95532_0 anaconda lazy-object-proxy 1.6.0 py37h2bbff1b_0 anaconda libclang 14.0.1 pypi_0 pypi libpng 1.6.37 h2a8f88b_0 anaconda libtiff 4.2.0 hd0e1b90_0 libuv 1.40.0 he774522_0 libwebp 1.2.2 h2bbff1b_0 lz4-c 1.9.3 h2bbff1b_1 markdown 3.3.7 pypi_0 pypi markupsafe 2.0.1 py37h2bbff1b_0 anaconda matplotlib-inline 0.1.2 pyhd3eb1b0_2 anaconda mccabe 0.7.0 pyhd3eb1b0_0 anaconda mistune 0.8.4 py37hfa6e2cd_1001 anaconda mkl 2021.4.0 haa95532_640 mkl-service 2.4.0 py37h2bbff1b_0 mkl_fft 1.3.1 py37h277e83a_0 mkl_random 1.2.2 py37hf11a4ad_0 msgpack 1.0.4 pypi_0 pypi nbclient 0.5.13 py37haa95532_0 anaconda nbconvert 6.4.4 py37haa95532_0 anaconda nbformat 5.3.0 py37haa95532_0 anaconda nest-asyncio 1.5.5 py37haa95532_0 anaconda networkx 2.6.3 pypi_0 pypi numpy 1.21.5 py37h7a0a035_3 numpy-base 1.21.5 py37hca35cd5_3 numpydoc 1.2 pyhd3eb1b0_0 anaconda oauthlib 3.2.0 pypi_0 pypi openssl 1.1.1o h2bbff1b_0 anaconda opt-einsum 3.3.0 pypi_0 pypi packaging 20.9 pypi_0 pypi pandocfilters 1.5.0 pyhd3eb1b0_0 anaconda parso 0.8.2 pypi_0 pypi pickleshare 0.7.5 pyhd3eb1b0_1003 anaconda pillow 9.0.1 py37hdc2b20a_0 pip 22.1.2 pypi_0 pypi platformdirs 2.4.0 pyhd3eb1b0_0 anaconda prompt-toolkit 3.0.18 pypi_0 pypi psutil 5.8.0 py37h2bbff1b_1 anaconda pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pycodestyle 2.8.0 pyhd3eb1b0_0 anaconda pycparser 2.20 pypi_0 pypi pyflakes 2.4.0 pyhd3eb1b0_0 anaconda pygments 2.9.0 pypi_0 pypi pylint 2.12.2 py37haa95532_1 anaconda pyopenssl 22.0.0 pyhd3eb1b0_0 anaconda pyparsing 3.0.4 pyhd3eb1b0_0 anaconda pyqt 5.9.2 py37ha878b3d_0 anaconda pyrsistent 0.17.3 pypi_0 pypi pysocks 1.7.1 py37_1 anaconda python 3.7.13 h6244533_0 python-dateutil 2.8.2 pyhd3eb1b0_0 python-fastjsonschema 2.15.1 pyhd3eb1b0_0 anaconda python_abi 3.7 2_cp37m conda-forge pytorch 1.12.0 py3.7_cuda11.6_cudnn8_0 pytorch pytorch-mutex 1.0 cuda pytorch pytz 2021.3 pyhd3eb1b0_0 anaconda pywavelets 1.3.0 pypi_0 pypi pywin32 302 py37h2bbff1b_2 anaconda pywin32-ctypes 0.2.0 py37_1001 anaconda pywinpty 2.0.5 pypi_0 pypi pyzmq 23.2.0 pypi_0 pypi qt 5.9.7 vc14h73c81de_0 [vc14] anaconda qtawesome 1.0.3 pyhd3eb1b0_0 anaconda qtconsole 5.3.0 pyhd3eb1b0_0 anaconda qtpy 2.0.1 pyhd3eb1b0_0 anaconda ray 1.13.0 pypi_0 pypi requests 2.27.1 pyhd3eb1b0_0 anaconda requests-oauthlib 1.3.1 pypi_0 pypi rope 0.22.0 pyhd3eb1b0_0 anaconda rsa 4.8 pypi_0 pypi scikit-image 0.19.3 pypi_0 pypi setuptools 62.6.0 pypi_0 pypi sip 6.5.1 py37hd77b12b_0 anaconda six 1.16.0 pyhd3eb1b0_1 anaconda snowballstemmer 2.2.0 pyhd3eb1b0_0 anaconda soupsieve 2.3.1 pyhd3eb1b0_0 anaconda sphinx 4.4.0 pyhd3eb1b0_0 anaconda sphinxcontrib-applehelp 1.0.2 pyhd3eb1b0_0 anaconda sphinxcontrib-devhelp 1.0.2 pyhd3eb1b0_0 anaconda sphinxcontrib-htmlhelp 2.0.0 pyhd3eb1b0_0 anaconda sphinxcontrib-jsmath 1.0.1 pyhd3eb1b0_0 anaconda sphinxcontrib-qthelp 1.0.3 pyhd3eb1b0_0 anaconda sphinxcontrib-serializinghtml 1.1.5 pyhd3eb1b0_0 anaconda spyder 3.3.6 py37_0 anaconda spyder-kernels 0.5.2 py37_0 anaconda sqlite 3.38.3 h2bbff1b_0 tabulate 0.8.9 pypi_0 pypi tensorboard 2.9.1 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorboardx 2.5.1 pypi_0 pypi tensorflow 2.9.1 pypi_0 pypi tensorflow-estimator 2.9.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.26.0 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi testpath 0.5.0 pyhd3eb1b0_0 anaconda tifffile 2021.11.2 pypi_0 pypi tk 8.6.12 h2bbff1b_0 toml 0.10.2 pyhd3eb1b0_0 anaconda torchaudio 0.12.0 py37_cu116 pytorch torchvision 0.13.0 py37_cu116 pytorch tornado 6.1 py37h2bbff1b_0 anaconda traitlets 5.3.0 pypi_0 pypi typed-ast 1.4.3 py37h2bbff1b_1 anaconda typing-extensions 3.10.0.0 pypi_0 pypi typing_extensions 4.1.1 pyh06a4308_0 urllib3 1.26.9 py37haa95532_0 anaconda vc 14.2 h21ff451_1 virtualenv 20.14.1 pypi_0 pypi vs2015_runtime 14.27.29016 h5e58377_2 wcwidth 0.2.5 pyhd3eb1b0_0 anaconda webencodings 0.5.1 py37_1 anaconda werkzeug 2.1.2 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0 win_inet_pton 1.1.0 py37haa95532_0 anaconda wincertstore 0.2 py37haa95532_2 wrapt 1.13.3 py37h2bbff1b_2 anaconda xz 5.2.5 h8cc25b3_1 zipp 3.7.0 pyhd3eb1b0_0 anaconda zlib 1.2.11 vc14h1cdd9ab_1 [vc14] anaconda zstd 1.4.9 h19a0ad4_0

I was wondering if someone could be helpful in this regard. I would be happy to share more information if you need.

The other question is that is there any plan in near future to make a version of warp-drive for MacBook Pro Apple M1.

Many thanks in advance!
opened by aslansd 2
[Tutorial 2 + 3] Error when loading test_build.fatbin file in tutorials (No kernel image is available for execution on the device)
Dear WarpDrive team,

I have came across an error that is consistent in tutorials 2 and 3.

The error occurs when this particular line of code is run-ed:

cuda_function_manager.load_cuda_from_binary_file(f"{_CUBIN_FILEPATH}/test_build.fatbin")

and the error that pops-up is:

RuntimeError: cuModuleLoad failed: no kernel image is available for execution on the device

Is there a solution or a work around for this?

I am trying to learning WarpDrive to implement it for a MARL path finding scenario for possible use in application scenarios (i.e., warehouses).

Thank you and I appreciate your prompt reply in this :)
opened by rllyryan 2

Error: Invalid Resource handle.

Hello WarpDrive Team,

A good MARL library indeed. I have tried this library on an old machine and it works fine.

However, when I moved to a new machine, I met the following error.

(warp_drive) ***@***-lab-gpu:~/warp-drive-master/warp_drive$ python training/example_training_script.py --env tag_continuous --num_gpus 1 --results_dir ..
We have successfully found 1 GPUs!
Training with 1 GPU(s).
Traceback (most recent call last):
  File "training/example_training_script.py", line 224, in <module>
    setup_trainer_and_train(run_config, results_directory=results_dir)
  File "training/example_training_script.py", line 126, in setup_trainer_and_train
    trainer.train()
  File "/home/mwj/warp-drive-master/warp_drive/training/trainer.py", line 402, in train
    metrics = self._update_model_params(iteration)
  File "/home/mwj/warp-drive-master/warp_drive/training/trainer.py", line 741, in _update_model_params
    loss.backward()
  File "/home/mwj/anaconda3/envs/warp_drive/lib/python3.7/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/mwj/anaconda3/envs/warp_drive/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: Event device type CUDA does not match blocking stream's device type CPU.
Exception ignored in: <function CUDASampler.__del__ at 0x7f86b065e9e0>
Traceback (most recent call last):
  File "/home/mwj/warp-drive-master/warp_drive/managers/function_manager.py", line 637, in __del__
    free(block=self._block, grid=self._grid)
  File "/home/mwj/anaconda3/envs/warp_drive/lib/python3.7/site-packages/pycuda/driver.py", line 480, in function_call
    func._set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource handle

And my nvidia-smi command looks like this.

Tue Apr  5 23:10:52 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| 30%   24C    P8    34W / 350W |    326MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1268      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A      1771      G   /usr/lib/xorg/Xorg                144MiB |
|    0   N/A  N/A      1884      G   /usr/bin/gnome-shell               55MiB |
|    0   N/A  N/A      3043      G   gnome-control-center               12MiB |
|    0   N/A  N/A      6784      G   ...792671094337050779,131072       46MiB |
|    0   N/A  N/A     12488      G   ...RendererForSitePerProcess       15MiB |
+-----------------------------------------------------------------------------+

The result of running run_unittest.py looks like this.

(warp_drive) mwj@mwj-lab-gpu:~/warp-drive-master/warp_drive$ python utils/run_unittests.py
Running Unit tests ... 
/home/mwj/warp-drive-master/warp_drive/cuda_includes/../../example_envs/tag_gridworld/tag_gridworld_step.cu(151): warning #2361-D: invalid narrowing conversion from "unsigned int" to "int"

====================================================================================== test session starts =======================================================================================
platform linux -- Python 3.7.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /home/mwj/warp-drive-master
collected 13 items                                                                                                                                                                               

../tests/example_envs/test_tag_continuous.py .                                                                                                                                             [  7%]
../tests/example_envs/test_tag_gridworld.py .                                                                                                                                              [ 15%]
../tests/example_envs/test_tag_gridworld_step_cuda.py .                                                                                                                                    [ 23%]
../tests/example_envs/test_tag_gridworld_step_python.py ..                                                                                                                                 [ 38%]
../tests/warp_drive/test_action_sampler.py ...                                                                                                                                             [ 61%]
../tests/warp_drive/test_data_manager.py ...                                                                                                                                               [ 84%]
../tests/warp_drive/test_env_reset.py .                                                                                                                                                    [ 92%]
../tests/warp_drive/test_function_manager.py .                                                                                                                                             [100%]

======================================================================================== warnings summary ========================================================================================
../../anaconda3/envs/warp_drive/lib/python3.7/site-packages/gym/envs/registration.py:250
  /home/mwj/anaconda3/envs/warp_drive/lib/python3.7/site-packages/gym/envs/registration.py:250: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select.
    for plugin in metadata.entry_points().get(entry_point, []):

../../anaconda3/envs/warp_drive/lib/python3.7/site-packages/pycuda/compyte/dtypes.py:120
  /home/mwj/anaconda3/envs/warp_drive/lib/python3.7/site-packages/pycuda/compyte/dtypes.py:120: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    reg.get_or_register_dtype("bool", np.bool)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
================================================================================= 13 passed, 2 warnings in 5.38s =================================================================================

As the unit tests have passed, I think the cuda version mismatch may not be an issue.

Also, as there are many other environments on this machine, I wonder if there exists a solution to change my environment as little as possible.

So what can I do to fix this issue? Any idea helps.

Many thanks!

opened by Ma-Weijian 2

Warp Drive PyCuda Error

I am currently running a training script using warp-drive.

I have my environment initialized in this dockerfile.

When running my training_script, I get the following error:

python training_script.py --env simple_wood_and_stone

Inside training_script.py: 1 GPUs are available.
Inside env_wrapper.py: 1 GPUs are available.
/home/miniconda/lib/python3.7/site-packages/torch/cuda/__init__.py:120: UserWarning:
    Found GPU%d %s which is of cuda capability %d.%d.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is %d.%d.

  warnings.warn(old_gpu_warn.format(d, name, major, minor, min_arch // 10, min_arch % 10))
Initializing the CUDA data manager...
Initializing the CUDA function manager...
WARNING:root:the destination header file /home/miniconda/lib/python3.7/site-packages/warp_drive/cuda_includes/env_config.h already exists; remove and rebuild.
WARNING:root:the destination runner file /home/miniconda/lib/python3.7/site-packages/warp_drive/cuda_includes/env_runner.cu already exists; remove and rebuild.
Traceback (most recent call last):
  File "training_script.py", line 109, in <module>
    customized_env_registrar=env_registry,
  File "/home/miniconda/lib/python3.7/site-packages/ai_economist/foundation/env_wrapper.py", line 208, in __init__
    self.cuda_function_manager.initialize_functions([step_function])
  File "/home/miniconda/lib/python3.7/site-packages/warp_drive/managers/function_manager.py", line 330, in initialize_functions
    self._cuda_functions[fname] = self._CUDA_module.get_function(fname)
pycuda._driver.LogicError: cuModuleGetFunction failed: named symbol not found

was wondering if someone ran into this before or has any idea how to fix it?

opened by mhelabd 2

How to implement CTDE-based MARL algorithms on the platform?

How to implement joint-learning-based MARL algorithms(e.g. MAPPO, QMIX, etc.) but not independent-learning-based algorithms(such as ppo implemented in the paper) on warp-drive? Dow you have the plan to give some tutorials about this? Thanks alot~

opened by fmxFranky 6

Creating a 4D custom environment from Gridworld 2D env

Dear all, I am new to reinforcement learning, but I am fascinated with the Warp Drive. I was wondering if you could help me to build up my custom env for my little study project. The story of my env is like: I wanna create a gym 4D environment, where it is a 468x225x182x54 plane (which means 1,034,888,400 unique cells). And every cell in this space has a unique value. And my agent (e.g. rabbit) can jump anywhere in this space and makes cells get zero value (or burned after the point of the cell gets collected by the rabbit). Also the agent will be rewarded based on reduction of the environment overall points (e.g. 2000) from the change of cell values to zero. Which cells have more points or reward is unknown to agent but fixed, and it is the task of the agent to find out by making jump in order to burn more higher value cells before game episode length finish. I thought my action space could be defined as

class CustomEnv(gym.Env):
    def __init__(self):
           self.action_space = gym.spaces.MultiDiscrete([468, 225, 182, 54])

For example

 print(CustomEnv.action_space.sample())
[172 54 101 37]

where my agent collects the reward of the location [172 54 101 37]. And all values at this cell is zero now. When the game starts the agent would jump to this 4D space (I assume it is better to make my first episode start at a fixed position but buffer action(no values are zeroed at this first episode) and during policy training agent learns to begin with an action that makes a globally better reward). Furthermore, I want the step function for episodes of the game be like a rabbit make a jump, then the reward is returned. Also, the returned state of the episode is the 4D space with same shape but the value of it will change from zeroing of previous action.

However, I don't know how should I define my observation space and I really appreciate your help.

So far, for example if I modify your gridword example env:

import numpy as np
from gym import spaces
from gym.utils import seeding

# seeding code from https://github.com/openai/gym/blob/master/gym/utils/seeding.py
from warp_drive.utils.constants import Constants
from warp_drive.utils.data_feed import DataFeed
from warp_drive.utils.gpu_environment_context import CUDAEnvironmentContext

_OBSERVATIONS = Constants.OBSERVATIONS
_ACTIONS = Constants.ACTIONS
_REWARDS = Constants.REWARDS

# Our Custom field, where it is 4D space of size 468x225x182x54, and each cell has a random value
RabbitField_World = np.array([np.random.randint(0,5,468), np.random.randint(0,5,225), np.random.randint(0,5,182), np.random.randint(0,5,54)])
RabbitField_World_Fixed_Points = (sum(RabbitField_World[0])+sum(RabbitField_World[1])+sum(RabbitField_World[2])+sum(RabbitField_World[3]))
_LOC_X = "cells_dim_x"
_LOC_Y = "cells_dim_y"
_LOC_Z = "cells_dim_z"
_LOC_K = "cells_dim_k"


def burning(dim_world, jump_pos):
    dim_world[jump_pos] = 0 
    return dim_world

class RabbitField:
    """
    The game of tag on a 4D 468x225x182x54 plane.
    There are a number of agents (Rabbits) trying to minimize the plane overall points.
    A cell might have a value from range of 0 to 5. An agent jumps on a cell and collects 
        the point of it, and the value of the cell becomes zero.
    The reward will be the remaining points in the 4D plane.
    """

    def __init__(
        self,
        num_agents=1,
        grid_dim_one=468,
        grid_dim_two=225,
        grid_dim_three=182,
        grid_dim_four=54,
        episode_length=100,
        starting_cells_x=RabbitField_World[0],
        starting_cells_y=RabbitField_World[1],
        starting_cells_z=RabbitField_World[2],
        starting_cells_k=RabbitField_World[3],
        finish_point = 1000,
        seed=None,
        step_cost_for_agent=0.01,
        use_full_observation=True,
        env_backend="cpu"
    ):
        """
        :param num_agents (int): the total number of rabbits. In this env,
            num_agent = 1 or each env can have only one rabbit or multi.
        :param grid_dim_# (int): the world is a 4D space,
        :param episode_length (int): episode length
        :param starting_location_x ([ndarray], optional): starting x axis cells values
            of the 4D plane.
        :param starting_location_y ([ndarray], optional): starting y axis cells values
            of the 4D agents.
        :param starting_location_z ([ndarray], optional): starting z axis cells values
            of the 4D agents.
        :param starting_location_k ([ndarray], optional): starting k axis cells values
            of the 4D agents.
        :param finish_point = 1000: The sufficient reward to finish the game.
        :param seed: seeding parameter.
        :param step_cost_for_agent (float): penalty for each jump that rabbit makes
        :param use_full_observation (bool): boolean indicating whether to
            include all the agents' data in the use_full_observation or
            just the nearest neighbor. Defaults to True.
        """
        assert num_agents > 0
        self.num_agents = num_agents

        assert episode_length > 0
        self.episode_length = episode_length

        self.grid_dim_one = grid_dim_one
        self.grid_dim_two = grid_dim_two
        self.grid_dim_three = grid_dim_three
        self.grid_dim_four = grid_dim_four

        # Seeding
        self.np_random = np.random
        if seed is not None:
            self.seed(seed)


        self.starting_cells_x = starting_cells_x
        self.starting_cells_y = starting_cells_y
        self.starting_cells_z = starting_cells_z
        self.starting_cells_k = starting_cells_k

        # Each possible action is a cell position in the self.RabbitField_World 
        self.step_actions = [468, 225, 182, 54]

        # Defining observation and action spaces
        self.observation_space = None  # Note: this will be set via the env_wrapper

        self.action_space = {
            agent_id: spaces.MultiDiscrete(self.step_actions)
            for agent_id in range(self.num_agents)
        }

        # These will be set during reset (see below)
        self.timestep = None
        self.global_state = None

        # For reward computation
        self.step_cost_for_agent = step_cost_for_agent
        self.finish_point = finish_point  #this is a fixed reward defined by us to end the game
        self.reward_penalty = np.zeros(self.num_agents)
        self.use_full_observation = use_full_observation

        self.env_backend = env_backend

    name = "RabbitField"

    def seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def set_global_state(self, key=None, value=None, t=None, dtype=None):
        assert key is not None
        if dtype is None:
            dtype = np.int32

        # If no values are passed, set everything to zeros.
        if key not in self.global_state:
            self.global_state[key] = np.zeros(
                (self.episode_length + 1, self.num_agents), dtype=dtype
            )

        if t is not None and value is not None:
            assert isinstance(value, np.ndarray)
            assert value.shape[0] == self.global_state[key].shape[1]

            self.global_state[key][t] = value



    def update_state(self, actions_x, actions_y, actions_z, actions_k):
        loc_x_prev_t = self.global_state[_LOC_X][self.timestep - 1]
        loc_y_prev_t = self.global_state[_LOC_Y][self.timestep - 1]
        loc_z_prev_t = self.global_state[_LOC_Z][self.timestep - 1]
        loc_k_prev_t = self.global_state[_LOC_K][self.timestep - 1]


        loc_x_curr_t = burning(loc_x_prev_t, actions_x)
        loc_y_curr_t = burning(loc_y_prev_t, actions_y)
        loc_z_curr_t = burning(loc_z_prev_t, actions_z)
        loc_k_curr_t = burning(loc_k_prev_t, actions_k)


        self.set_global_state(key=_LOC_X, value=loc_x_curr_t, t=self.timestep)
        self.set_global_state(key=_LOC_Y, value=loc_y_curr_t, t=self.timestep)
        self.set_global_state(key=_LOC_Z, value=loc_z_curr_t, t=self.timestep)
        self.set_global_state(key=_LOC_K, value=loc_k_curr_t, t=self.timestep)

        #Our Rabbit Field Custom Reward from collecting points, the more the current 4D plane lose overall values, the more reward be increased.
        self.reward_collection = RabbitField_World_Fixed_Points - (sum(loc_x_curr_t)+sum(loc_y_curr_t)+sum(loc_z_curr_t)+sum(loc_k_curr_t))
        if self.reward_collection >= self.finish_point:
            tag = True

        reward = self.reward_collection
        rew = {}
        for agent_id, r in enumerate(reward):
            rew[agent_id] = r

        return rew, tag

    def generate_observation(self):
        obs = {}
        if self.use_full_observation:
            common_obs = None
            for feature in [
                _LOC_X,
                _LOC_Y,
                _LOC_Z,
                _LOC_K,
            ]:
                if common_obs is None:
                    common_obs = self.global_state[feature][self.timestep]
                else:
                    common_obs = np.vstack(
                        (common_obs, self.global_state[feature][self.timestep])
                    )
            normalized_common_obs = common_obs 

            agent_types = np.array(
                [self.agent_type[agent_id] for agent_id in range(self.num_agents)]
            )

            for agent_id in range(self.num_agents):
                agent_indicators = np.zeros(self.num_agents)
                agent_indicators[agent_id] = 1
                obs[agent_id] = np.concatenate(
                    [
                        np.vstack(
                            (normalized_common_obs, agent_types, agent_indicators)
                        ).reshape(-1),
                        np.array([float(self.timestep) / self.episode_length]),
                    ]
                )
        else:
            for agent_id in range(self.num_agents):
                feature_list = []
                for feature in [
                    _LOC_X,
                    _LOC_Y,
                    _LOC_Z,
                    _LOC_K,
                ]:
                    feature_list.append(
                        self.global_state[feature][self.timestep][agent_id]
                    )
                if agent_id < self.num_agents - 1:
                    for feature in [
                        _LOC_X,
                        _LOC_Y,
                        _LOC_Z,
                        _LOC_K,
                    ]:
                        feature_list.append(
                            self.global_state[feature][self.timestep][-1]
                        )
                else:
                    dist_array = None
                    for feature in [
                        _LOC_X,
                        _LOC_Y,
                        _LOC_Z,
                        _LOC_K,
                    ]:
                        if dist_array is None:
                            dist_array = np.square(
                                self.global_state[feature][self.timestep][:-1]
                                - self.global_state[feature][self.timestep][-1]
                            )
                        else:
                            dist_array += np.square(
                                self.global_state[feature][self.timestep][:-1]
                                - self.global_state[feature][self.timestep][-1]
                            )
                    min_agent_id = np.argmin(dist_array)
                    for feature in [
                        _LOC_X,
                        _LOC_Y,
                        _LOC_Z,
                        _LOC_K,
                    ]:
                        feature_list.append(
                            self.global_state[feature][self.timestep][min_agent_id]
                        )
                feature_list += [
                    self.agent_type[agent_id],
                    float(self.timestep) / self.episode_length,
                ]
                obs[agent_id] = np.array(feature_list)
        return obs

    def reset(self):
        # Reset time to the beginning
        self.timestep = 0

        # Re-initialize the global state
        self.global_state = {}
        self.set_global_state(
            key=_LOC_X, value=self.starting_cells_x, t=self.timestep, dtype=np.int32
        )
        self.set_global_state(
            key=_LOC_Y, value=self.starting_cells_y, t=self.timestep, dtype=np.int32
        )
        self.set_global_state(
            key=_LOC_Z, value=self.starting_cells_z, t=self.timestep, dtype=np.int32
        )
        self.set_global_state(
            key=_LOC_K, value=self.starting_cells_k, t=self.timestep, dtype=np.int32
        )
        return self.generate_observation()

    def step(
        self,
        actions=None,
    ):
        self.timestep += 1
        assert isinstance(actions, dict)
        assert len(actions) == self.num_agents

        actions_x = np.array(
            [
                actions[agent_id][0]
                for agent_id in range(self.num_agents)
            ]
        )
        actions_y = np.array(
            [
                actions[agent_id][1]
                for agent_id in range(self.num_agents)
            ]
        )
        actions_z = np.array(
            [
                actions[agent_id][2]
                for agent_id in range(self.num_agents)
            ]
        )
        actions_k = np.array(
            [
                actions[agent_id][3]
                for agent_id in range(self.num_agents)
            ]
        )

        rew, tag = self.update_state(actions_x, actions_y, actions_z, actions_k)
        obs = self.generate_observation()
        done = {"__all__": self.timestep >= self.episode_length or tag}
        info = {}

        return obs, rew, done, info


class CUDARabbitField(RabbitField, CUDAEnvironmentContext):
    """
    CUDA version of the RabbitField environment.
    Note: this class subclasses the Python environment class RabbitField,
    and also the  CUDAEnvironmentContext
    """

    def get_data_dictionary(self):
        data_dict = DataFeed()
        for feature in [
            _LOC_X,
            _LOC_Y,
            _LOC_Z,
            _LOC_K,
        ]:
            data_dict.add_data(
                name=feature,
                data=self.global_state[feature][0],
                save_copy_and_apply_at_reset=True,
                log_data_across_episode=True,
            )
        data_dict.add_data_list(
            [
                ("finish_point", self.finish_point),
                ("step_cost_for_agent", self.step_cost_for_agent),
                ("use_full_observation", self.use_full_observation),
            ]
        )
        return data_dict

    def get_tensor_dictionary(self):
        tensor_dict = DataFeed()
        return tensor_dict

    def step(self, actions=None):
        self.timestep += 1
        args = [
            _LOC_X,
            _LOC_Y,
            _LOC_Z,
            _LOC_K,
            _ACTIONS,
            "_done_",
            _REWARDS,
            _OBSERVATIONS,
            "finish_point",
            "step_cost_for_agent",
            "use_full_observation",
            "_timestep_",
            ("episode_length", "meta"),
        ]
        if self.env_backend == "pycuda":
            self.cuda_step(
                *self.cuda_step_function_feed(args),
                block=self.cuda_function_manager.block,
                grid=self.cuda_function_manager.grid,
            )
        elif self.env_backend == "numba":
            self.cuda_step[
                self.cuda_function_manager.grid, self.cuda_function_manager.block
            ](*self.cuda_step_function_feed(args))
        else:
            raise Exception("CUDARabbitField expects env_backend = 'pycuda' or 'numba' ")

opened by Mshz2 3

Update README.md for limitations

In the readme it states that this library is useful / fast for simple RL problems, and that the environment that this library contains was created to be simple for understanding purposes, however, this leads me to my question [apologies as I dont know of another way of sending this message to you w/o a github issue, as its just my lack of understanding, but perhaps it will help others]

What are the limitations of this library?

Could I create an environment suchas a humanoid and create multiple instances of humanoids in one environment and have them learn [" " cheat] from each other to learn the fastest way to get across the environment [100m dash for example]

Could this library be used to train agents within a unity environment [probably wouldnt be actually training in the unity environment itself, but rather visualized in the unity environment after training] ?

opened by nubonics 3
Numba #Enhancement

Perhaps, I misunderstand [very possible] what numba is for, but maybe it can used to replace learning how to write cuda c code, and instead just write python code that numba can translate [which runs directly on the gpu] into cuda c code?

opened by nubonics 3

Releases(v2.0)

v2.0(Sep 30, 2022)
supports the dual backends of both CUDA C and the JIT compiled Numba.

supports end-to-end simulation and training on multi-GPUs with either CUDA C or Numba.

full backward compatibility with v1.0

Source code(tar.gz)
Source code(zip)
v1.6(Aug 19, 2022)
Using the extreme parallelization capability of GPUs, WarpDrive enables orders-of-magnitude faster RL compared to CPU simulation + GPU model implementations.

It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU.

runs simulations across multiple agents and multiple environment replicas in parallel.

provides the auto scaling tools to achieve the optimal throughput per device (version 1.3).

performs the distributed asynchronous training among multiple GPU devices (version 1.4).

combine multiple GPU blocks for one environment replica (version 1.6).

Source code(tar.gz)
Source code(zip)
warp-drive-v1.6.zip(26.32 MB)

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

Related tags

Overview

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

White Paper and Citing WarpDrive

Tutorials and Quick Start

Installation Instructions

Docker Image

Installing using Pip

Installing from Source

Testing your Installation

Learn More

Comments

Name Version Build Channel

Releases(v2.0)

v2.0(Sep 30, 2022)

v1.6(Aug 19, 2022)

Owner

Salesforce

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

Pre-trained Deep Learning models and demos (high quality and extremely fast)

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

A parallel framework for population-based multi-agent reinforcement learning.

A library of multi-agent reinforcement learning components and systems

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Multi-agent reinforcement learning algorithm and environment

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Deep Reinforcement Learning based Trading Agent for Bitcoin

Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

GrabGpu_py: a scripts for grab gpu when gpu is free