RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

Meta Research

Last update: Dec 22, 2022

Related tags

Deep Learning rlmeta

Overview

RLMeta

rlmeta - a flexible lightweight research framework for Distributed Reinforcement Learning based on PyTorch and moolib

Installation

To build from source, please install PyTorch first, and then run the commands below.

$ git clone https://github.com/facebookresearch/rlmeta
$ cd rlmeta
$ git submodule sync && git submodule update --init --recursive
$ pip install -e .

Run an Example

To run the example for Atari Pong game with PPO algorithm:

$ cd examples/atari/ppo
$ python atari_ppo.py env="PongNoFrameskip-v4" num_epochs=20

We are using hydra to define configs for trainining jobs. The configs are defined in

./conf/conf_ppo.yaml

The logs and checkpoints will be automatically saved to

./outputs/{YYYY-mm-dd}/{HH:MM:SS}/

After training, we can draw the training curve by run

$ python ../../plot.py --log_file=./outputs/{YYYY-mm-dd}/{HH:MM:SS}/atari_ppo.log --fig_file=./atari_ppo.png --xkey=time

One example of the training curve is shown below.

License

rlmeta is licensed under the MIT License. See LICENSE for details.

Comments

m_server::push time out and m_server::act time out

I was trying to execute the example program atari_ppo.py on the following machine: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz 32GB RAM GTX 1080 with 8G RAM Ubuntu 16.04 cuda 10.2 == I have edited my configuration file conf_ppo.yaml to adapt to reduce the resource usage

m_server_name: "m_server"
m_server_addr: "127.0.0.1:4411"

r_server_name: "r_server"
r_server_addr: "127.0.0.1:4412"

c_server_name: "c_server"
c_server_addr: "127.0.0.1:4413"

train_device: "cuda:0"
infer_device: "cuda:0"

timeout: 180

env: "PongNoFrameskip-v4"
max_episode_steps: 2700

num_train_rollouts: 1 
num_train_workers: 1

num_eval_rollouts: 1
num_eval_workers: 1

replay_buffer_size: 1024 
prefetch: 2

batch_size: 32
lr: 3e-4
push_every_n_steps: 50

num_epochs: 1000
steps_per_epoch: 3000

num_eval_episodes: 20

train_seed: 123
eval_seed: 456

Here is what I got:

[2022-01-18 18:34:54,797][root][INFO] - {'m_server_name': 'm_server', 'm_server_addr': '127.0.0.1:4411', 'r_server_name': 'r_server', 'r_server_addr': '127.0.0.1:4412', 'c_server_name': 'c_server', 'c_server_addr': '127.0.0.1:4413', 'train_device': 'cuda:0', 'infer_device': 'cuda:0', 'env': 'PongNoFrameskip-v4', 'max_episode_steps': 2700, 'num_train_rollouts': 1, 'num_train_workers': 1, 'num_eval_rollouts': 1, 'num_eval_workers': 1, 'replay_buffer_size': 1024, 'prefetch': 2, 'batch_size': 8, 'lr': 0.0003, 'push_every_n_steps': 100, 'num_epochs': 20, 'steps_per_epoch': 300, 'num_eval_episodes': 20, 'train_seed': 123, 'eval_seed': 456}
[2022-01-18 18:35:08,193][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:09,194][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:10,196][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:11,198][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:12,220][root][INFO] - Warming up replay buffer: [    0 / 1024 ]
[2022-01-18 18:35:13,222][root][INFO] - Warming up replay buffer: [  894 / 1024 ]
[2022-01-18 18:35:14,228][root][INFO] - Warming up replay buffer: [  894 / 1024 ]
[2022-01-18 18:35:15,229][root][INFO] - Warming up replay buffer: [  894 / 1024 ]
[2022-01-18 18:35:16,231][root][INFO] - Warming up replay buffer: [ 1024 / 1024 ]
Exception in callback handle_task_exception(<Task finishe...) timed out')>) at /media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py:11
handle: <Handle handle_task_exception(<Task finishe...) timed out')>) at /media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py:11>
Traceback (most recent call last):
  File "/home/ml2558/miniconda3/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py", line 17, in handle_task_exception
    raise e
  File "/media/research/ml2558/rlmeta/rlmeta/utils/asycio_utils.py", line 13, in handle_task_exception
    task.result()
  File "/media/research/ml2558/rlmeta/rlmeta/core/loop.py", line 161, in _run_loop
    stats = await self._run_episode(env, agent, index)
  File "/media/research/ml2558/rlmeta/rlmeta/core/loop.py", line 182, in _run_episode
    action = await agent.async_act(timestep)
  File "/media/research/ml2558/rlmeta/rlmeta/agents/ppo/ppo_agent.py", line 78, in async_act
    action, logpi, v = await self.model.async_act(
RuntimeError: Call (m_server::act) timed out
Error executing job with overrides: ['env=PongNoFrameskip-v4', 'num_epochs=20']
Traceback (most recent call last):
  File "/media/research/ml2558/rlmeta/examples/atari/ppo/atari_ppo.py", line 96, in main
    stats = agent.train(cfg.steps_per_epoch)
  File "/media/research/ml2558/rlmeta/rlmeta/agents/ppo/ppo_agent.py", line 139, in train
    self.model.push()
  File "/media/research/ml2558/rlmeta/rlmeta/core/model.py", line 69, in push
    self.client.sync(self.server_name, "push", state_dict)
RuntimeError: Call (m_server::<unknown>) timed out

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I tried to modify the timeout but seems with the same error. Any hint on how to resolve this?

opened by lmlaaron 5

Replay buffer crashes after being cleared

Minimal example:

import torch
from _rlmeta_extension import UniformSampler
from rlmeta.core.replay_buffer import ReplayBuffer
from rlmeta.storage import TensorCircularBuffer

replay_buffer = ReplayBuffer(TensorCircularBuffer(12), UniformSampler())

while True:
    for t in torch.randn(size=(12,2)).chunk(12,dim=0):
        replay_buffer.append(t)
        replay_buffer.sample(12)
    replay_buffer.clear()

Stack trace:

RuntimeError: output with shape [2] doesn't match the broadcast shape [1, 2]
Exception raised from mark_resize_outputs at ../aten/src/ATen/TensorIterator.cpp:1181 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7fd72c9a220e in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7fd72c97d5e8 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #2: at::TensorIteratorBase::mark_resize_outputs(at::TensorIteratorConfig const&) + 0x241 (0x7fd755cf6301 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x64 (0x7fd755cf6e54 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x19d4f8c (0x7fd755f11f8c in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #5: at::native::copy_(at::Tensor&, at::Tensor const&, bool) + 0x62 (0x7fd755f12ec2 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::_ops::copy_::redispatch(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) + 0x75 (0x7fd756886555 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x46e94f5 (0x7fd758c264f5 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::copy_::redispatch(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) + 0x75 (0x7fd756886555 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x46ea6ad (0x7fd758c276ad in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) + 0x16e (0x7fd7568cdbce in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x495df (0x7fd7024265df in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #12: <unknown function> + 0x4a0c0 (0x7fd7024270c0 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #13: <unknown function> + 0x1dd0f (0x7fd7023fad0f in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>
frame #31: <unknown function> + 0x3feb0 (0x7fd7aa936eb0 in /lib64/libc.so.6)
frame #32: __libc_start_main + 0x80 (0x7fd7aa936f60 in /lib64/libc.so.6)
frame #33: _start + 0x25 (0x5649803a1095 in /home/d3sm0/.venvs/torch_env/bin/python)

opened by d3sm0 4

Add more logging + ability to push to more downstream models
This PR adds:

repr functions for some classes

Rich based logging: https://rich.readthedocs.io/en/stable/introduction.html

Extra parameter additional_downstream_models to agent.train that can be used to push to more than one downstream model (e.g., if there are multiple parallel loops, push to the model that each loop is using).

CLA Signed
opened by EntilZha 4
Added namespace as a member of Remotable and updated example

Added an identifier for remotable class, this allows distinguish instances of the same class/different class sharing the same method as long as user defines a method.

Alternatively or additionally, we can also add the func name to the identifier to further distinguish between different instantiated classes with the same name. for example PPOAgent.forward() and APPOAgent.forward() wouldn't need an identifier additionally.
CLA Signed

opened by JD-ETH 4
Switch to new OpenAI Gym step API

The new version of OpenAI Gym uses a new step API which returns (observations, reward, termination, truncation, info) instead of (observations, reward, done, info). We have to make the wrappers to support this.

Track this progress in this issue.

opened by xiaomengy 2
Longer-term and relation to other RL libraries under Meta

Hi, excited to see this work on distributed RL, building off moolib (and TorchBeast originally). I'm wondering what the longer-term direction of this project is?

Will functionality be merged into TorchRL (which mentions an upcoming IMPALA implementation)? https://github.com/facebookresearch/rl#upcoming-features

Is moolib still being maintained? https://github.com/facebookresearch/moolib/issues/32#issuecomment-1085730793

There are so many RL libraries these days.

opened by etaoxing 2
How to sample partial trajectories?

Many value estimation methods relies on sub-sequences of a trajectory, (i.e. retrace, gae, n-step, lambda-returns). How can this be achieved with current samplers? A simple workaround would be to use a clever idx for each sample and use __get__ to extract the sub-sequence one element at a time, but I believe it might impact performances.

Other ideas? Otherwise how can this be implemented in the c++ code?

opened by d3sm0 1
Add passthrough rescaler + Git CI style checker
PR adds:

A passthrough rescaler to use if you don't want to rescale rewards.

Github workflow to run yapf format checker on main branch and PRs to main branch

Tested the build on my fork here after adding this PR branch (then deleting it before making PR)
CLA Signed
opened by EntilZha 1
Refactor Atari Models and Atari Game settings
This PR add the following changes.

Switch to the recommended settings for Atari Game based on https://arxiv.org/abs/1709.06009.

Refactor Atari model's implementation and add Impala backbone.

Update the default hyper-parameters of Ape-X DQN to R2D2-like settings.

CLA Signed
opened by xiaomengy 0
Deprecate old atari_wrappers
This PR deprecate the old Atari Wrappers.

Switch to Atari-v5 env as suggested in https://brosa.ca/blog/ale-release-v0.7

Use gym.wrappers.AtariPreprocessing to replace old atari_wrappers.

Add random seed for model server.

Remove TimeLimitWrapper and switch to gym.wrappers.TimeLimit.

CLA Signed
opened by xiaomengy 0
Update Ape-X DQN implementation with tricks in MEME
This PR updates Ape-X DQN implementation with tricks introduced in DeepMind's MEME paper. https://arxiv.org/pdf/2209.07550.pdf

Bootstrapping with online net

Q-value clip

CLA Signed
opened by xiaomengy 0
Pip installation fails in virtual env and SIGILL on DGX machines
it seems that pip install -e . does prepare the proper directories but does not include the built package. We solved by adding:

+ include_package_data=False, + packages=find_packages(include=['rlmeta', 'rlmeta.*']),

here: https://github.com/facebookresearch/rlmeta/blob/c43d0f11922b2b8d513b3227242844596dbc34e5/setup.py#L87

Nit: It might be useful to provide an easy way to pass a cuda/cudnn path to cmake, maybe something like DCUDNN_LIBRARY_PATH=os.einviron.get("CUDA_LIBRARY_PATH, "")

Finally the flag --march=native might cause some issues especially for HPC. We removed it for our cluster and managed to reliably train on different machines.
opened by d3sm0 0

TensorCircularBuffer with capacity larger of 1mln fails

Replay buffer of capacity of 1mln tries to allocate 846.72 gb. Steps to reproduce:

from rlmeta.storage import TensorCircularBuffer
import torch

rb = TensorCircularBuffer(capacity=int(1e6))
rb.append(torch.randn(10, 3, 84, 84))

Log:

RuntimeError: [enforce fail at alloc_cpu.cpp:66] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 846720000000 bytes. Error code 12 (Cannot allocate memory)
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x55 (0x7fd5b71980c5 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::alloc_cpu(unsigned long) + 0x7ac (0x7fd5b71894cc in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x23bc3 (0x7fd5b7176bc3 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #3: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) + 0x7bf (0x7fd5e04a5b2f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::detail::empty_cpu(c10::ArrayRef<long>, c10::ScalarType, bool, c10::optional<c10::MemoryFormat>) + 0x40 (0x7fd5e04a64a0 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #5: at::detail::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x34 (0x7fd5e04a64f4 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::native::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1f (0x7fd5e09b826f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x24f700b (0x7fd5e122a00b in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0xe3 (0x7fd5e0f75653 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x24d200f (0x7fd5e120500f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::_ops::empty_memory_format::call(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1b7 (0x7fd5e0fb3077 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x4586c (0x7fd5b5ba886c in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #12: <unknown function> + 0x49700 (0x7fd5b5bac700 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #13: <unknown function> + 0x4a0c0 (0x7fd5b5bad0c0 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #14: <unknown function> + 0x1dd0f (0x7fd5b5b80d0f in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>
frame #30: <unknown function> + 0x3feb0 (0x7fd65cacbeb0 in /lib64/libc.so.6)
frame #31: __libc_start_main + 0x80 (0x7fd65cacbf60 in /lib64/libc.so.6)

opened by d3sm0 1

Moolib Backend Issues
Recently there are several issues from moolib backend.

Based the observation of https://github.com/facebookresearch/moolib/issues/36, there is a performance regression in moolib.

There are several installation issues in moolib.

Based on this we are thinking about building another backend not using moolib. Open this issue to track the progress.

PR for gRPC backend: https://github.com/facebookresearch/rlmeta/pull/63
opened by xiaomengy 4
Add ProcessManager to maintain processes.

Currently the processes are created directly in Server and Loop. It is very common that there are some zombie processes left when the main process terminates. It may be better to have a ProcessManager to manage the processes on a single node.

Open a tracking issue here for this feature request.

opened by xiaomengy 0

RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

Related tags

Overview

RLMeta

Installation

Run an Example

License

Comments

Owner

Meta Research

Bagua is a flexible and performant distributed training algorithm development framework.

Official Tensorflow implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

Pytorch implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

A light-weight image labelling tool for Python designed for creating segmentation data sets.

DeLighT: Very Deep and Light-Weight Transformers

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

A light weight data augmentation tool for training CNNs and Viola Jones detectors

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

SenseNet is a sensorimotor and touch simulator for deep reinforcement learning research