A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Facebook Research

Last update: Jan 5, 2023

Related tags

Reinforcement Learning ReAgent

Overview

Applied Reinforcement Learning @ Facebook

Overview

ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook. ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training, counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent see the white paper here.

The platform was once named "Horizon" but we have adopted the name "ReAgent" recently to emphasize its broader scope in decision making and reasoning.

Algorithms Supported

Discrete-Action DQN
Parametric-Action DQN
Double DQN, Dueling DQN, Dueling Double DQN
Distributional RL: C51 and QR-DQN
Twin Delayed DDPG (TD3)
Soft Actor-Critic (SAC)

Installation

ReAgent can be installed via. Docker or manually. Detailed instructions on how to install ReAgent can be found here.

Usage

Detailed instructions on how to use ReAgent Models can be found here.

The ReAgent Serving Platform (RASP) tutorial is available here.

License

ReAgent is released under a BSD 3-Clause license. Find out more about it here.

Citing

@article{gauci2018horizon, title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform}, author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui}, journal={arXiv preprint arXiv:1811.00260}, year={2018} }

Comments

Upgrade ReAgent to use Python 3.8

Summary: Currently, we have some test failures (https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/1460/workflows/ecc21254-779b-4a89-a40d-ea317e839d96/jobs/8655) because we miss some latest features.

Differential Revision: D26977836
Merged fb-exported cla signed

opened by czxttkl 38
Reimplement MDNRNN using new gym.

Using our new gym, test MDNRNN feature importance/sensitivity. Also, train DQN to play POMDP string game with states embedded with MDNRNN. This is in preparation to nuke old gym folder.
Merged

opened by kaiwenw 23
Lightning SACTrainer
Summary:

Created ReAgentLightningModule as base class to implement genrator API

Implemented reporting for SAC

TODOs:

Convert TD3 to LightningModule

Fix the OSS version of model manager

Fix on-policy training with Gym (by creating GymDataModule)

Differential Revision: D23857511
Merged fb-exported cla signed
opened by kittipatv 22
add env flag to skip frozen registry check

Summary: Environment variable SKIP_FROZEN_REGISTRY_CHECK is checked. If it's !=0, we print a warning instead of raising an error when we attempt to add members to a frozen regitry.

Differential Revision: D32773682
fb-exported cla signed

opened by alexnikulkov 17
add async_run_episode to gymrunner to support envs with async step methods

Summary: I need this because my reward evaluation is done by an async coroutine (multiple trajectories are being generated in parallel)

Differential Revision: D25487664
Merged fb-exported cla signed Reverted

opened by alexnikulkov 16
Migrate REINFORCE trainer to Lightning

Summary: I migrated REINFORCE trainer to Lightning. I don't like the fake optimizer trick and I'll look into doing it more cleanly.

Differential Revision: D26246712
Merged fb-exported cla signed

opened by alexnikulkov 14
Extend Gymrunner, add Transition and Trajectory

Summary: Gymrunner is currently limited, which results in writing duplicated code when we're trying to replicate the previous gym environment's behavior, such as adding mdp_id, sequence_number to RB, or evaluating with gamma < 1.0. This change makes it easier to make those changes and without code dup.

Differential Revision: D21616090
Merged fb-exported

opened by kaiwenw 14

OOM killed

Hi,

I played dqn_workflow with 7.9G training_data. But i got a OOM Killed. Below is my environment and oom logs.

workflow : dqn_workflow.py training_data : 8 features, 20,249,257 rows, 7.9G training_eval_data : 8 features, 2,028,916 rows, 0.8G RAM : 80G

INFO:ml.rl.evaluation.evaluation_data_page:EvaluationDataPage minibatch size: 2028912
WARNING:ml.rl.evaluation.doubly_robust_estimator:Can't normalize DR-CPE because of small or negative logged_policy_score
Killed

[Tue May  7 22:05:38 2019] python invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[Tue May  7 22:05:38 2019] python cpuset=42ee6ef8b84594988960735ef211ac05221059efc2d524f2afc1e2b49eb46d0c mems_allowed=0-1
[Tue May  7 22:05:38 2019] CPU: 1 PID: 51997 Comm: python Tainted: P           O      4.20.13-1.el7.elrepo.x86_64 #1
[Tue May  7 22:05:38 2019] Hardware name: Dell Inc. PowerEdge C4140/013M88, BIOS 1.6.11 11/21/2018
[Tue May  7 22:05:38 2019] Call Trace:
[Tue May  7 22:05:38 2019]  dump_stack+0x63/0x88
[Tue May  7 22:05:38 2019]  dump_header+0x78/0x2a4
[Tue May  7 22:05:38 2019]  ? mem_cgroup_scan_tasks+0x9c/0xf0
[Tue May  7 22:05:38 2019]  oom_kill_process+0x26b/0x290
[Tue May  7 22:05:38 2019]  out_of_memory+0x140/0x4b0
[Tue May  7 22:05:38 2019]  mem_cgroup_out_of_memory+0x4b/0x80
[Tue May  7 22:05:38 2019]  try_charge+0x6e2/0x750
[Tue May  7 22:05:38 2019]  mem_cgroup_try_charge+0x8c/0x1e0
[Tue May  7 22:05:38 2019]  __add_to_page_cache_locked+0x1a0/0x300
[Tue May  7 22:05:38 2019]  ? scan_shadow_nodes+0x30/0x30
[Tue May  7 22:05:38 2019]  add_to_page_cache_lru+0x4e/0xd0
[Tue May  7 22:05:38 2019]  filemap_fault+0x428/0x7c0
[Tue May  7 22:05:38 2019]  ? xas_find+0x138/0x1a0
[Tue May  7 22:05:38 2019]  ? filemap_map_pages+0x153/0x3c0
[Tue May  7 22:05:38 2019]  __do_fault+0x3e/0xc0
[Tue May  7 22:05:38 2019]  __handle_mm_fault+0xbd6/0xe80
[Tue May  7 22:05:38 2019]  handle_mm_fault+0x102/0x220
[Tue May  7 22:05:38 2019]  __do_page_fault+0x21c/0x4c0
[Tue May  7 22:05:38 2019]  do_page_fault+0x37/0x140
[Tue May  7 22:05:38 2019]  ? page_fault+0x8/0x30
[Tue May  7 22:05:38 2019]  page_fault+0x1e/0x30
...
[Tue May  7 22:05:38 2019] Memory cgroup out of memory: Kill process 51997 (python) score 997 or sacrifice child
[Tue May  7 22:05:38 2019] Killed process 51997 (python) total-vm:102757536kB, anon-rss:83335008kB, file-rss:132692kB, shmem-rss:8192kB
[Tue May  7 22:05:42 2019] oom_reaper: reaped process 51997 (python), now anon-rss:0kB, file-rss:127188kB, shmem-rss:8192kB

green : CPU yellow : RAM

opened by pjy953 14

Add sparse features to reward decomposition
Summary: As a showcase for how to add sparse features to ReAgent

I take reference to mainly two examples:

fbsource/fbcode/minimal_viable_ai/ifr_uv/train.py

fbsource/fbcode/torchrecipes/rec/dlrm_main_fb.py

Differential Revision: D33225789
fb-exported cla signed
opened by czxttkl 13

Cannot build RaspCli inside the cpu Dockerfile

I followed the installation guide for Docker cpu and can successfully go inside the container and run the tests.

The tutorial has a part that says start an RP server. I can't do this because I don't have ./serving/build/RaspCli.

To build RaspCli I ran the following commands inside ./serving/build:

wget https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.3.0%2Bcpu.zip -O libtorch.zip && \
    unzip libtorch.zip && \
    rm libtorch.zip

conda install glog

apt -y install libgflags-dev \
                   libgoogle-glog-dev \
                   libboost-tools-dev \
                   libboost-thread1.62-dev

cmake -DCMAKE_PREFIX_PATH=$HOME/libtorch ..
make

The build fails with many multiple definitions error. Here is an excerpt:

/usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_large_model':
(.text+0x11c): multiple definition of `__morestack_large_model'
/usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x11c): first defined here
/usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__stack_split_initialize':
(.text+0x12c): multiple definition of `__stack_split_initialize'
/usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x12c): first defined here
/usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_get_guard':
(.text+0x155): multiple definition of `__morestack_get_guard'
/usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x155): first defined here
/usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_set_guard':
(.text+0x15f): multiple definition of `__morestack_set_guard'
/usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x15f): first defined here
/usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_make_guard':
(.text+0x169): multiple definition of `__morestack_make_guard'
/usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x169): first defined here
collect2: error: ld returned 1 exit status
CMakeFiles/RaspCli.dir/build.make:106: recipe for target 'RaspCli' failed
make[2]: *** [RaspCli] Error 1
CMakeFiles/Makefile2:110: recipe for target 'CMakeFiles/RaspCli.dir/all' failed
make[1]: *** [CMakeFiles/RaspCli.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2

How can I successfully build ReAgent inside the CPU Docker container?

I am running Ubuntu 18.04.3 LTS.

opened by hsm207 13

Getting a error running spark-submit job

Hello,

I am trying to follow the instructions here: https://github.com/facebookresearch/Horizon/blob/master/docs/usage.md

When I run this script: /usr/local/spark/bin/spark-submit
--class com.facebook.spark.rl.Preprocessor preprocessing/target/rl-preprocessing-1.1.jar
"cat ml/rl/workflow/sample_configs/discrete_action/timeline.json"

I am getting2019-02-27 00:57:03 INFO HiveMetaStore:746 - 0: get_database: global_temp 2019-02-27 00:57:03 INFO audit:371 - ugi=root ip=unknown-ip-addr cmd=get_database: global_temp 2019-02-27 00:57:03 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException Exception in thread "main" org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and 'source_table.mdp_id' is not an aggregate function. Wrap '()' in windowing function(s) or wrap 'source_table.mdp_id' in first() (or first_value) if you don't care which value you get.;; 'Sort ['HASH('mdp_id, 'sequence_number) ASC NULLS FIRST], false +- 'RepartitionByExpression ['HASH('mdp_id, 'sequence_number)], 200 +- 'Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, next_state_features#24, next_action#25, sequence_number#2, sequence_number_ordinal#26, time_diff#27, possible_actions#7, possible_next_actions#28, metrics#8] +- 'Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8, next_state_features#24, next_action#25, sequence_number_ordinal#26, _we3#30, possible_next_actions#28, next_state_features#24, next_action#25, sequence_number_ordinal#26, (coalesce(_we3#30, sequence_number#2) - sequence_number#2) AS time_diff#27, possible_next_actions#28] +- 'Window [lead(state_features#4, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS next_state_features#24, lead(action#5, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS next_action#25, row_number() windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS sequence_number_ordinal#26, lead(sequence_number#2, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS _we3#30, lead(possible_actions#7, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS possible_next_actions#28], [mdp_id#1], [mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST] +- 'Filter isnotnull('next_state_features) +- Aggregate [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8] +- SubqueryAlias source_table +- Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8] +- Filter ((ds#0 >= 2019-01-01) && (ds#0 <= 2019-01-01)) +- SubqueryAlias cartpole_discrete +- Relation[ds#0,mdp_id#1,sequence_number#2,action_probability#3,state_features#4,action#5,reward#6,possible_actions#7,metrics#8] json

I tried the steps, after manually installing Hbase (This step is missing in the documentation. Please let me know, if you want me to add it)

I am using docker on Mac instructions (https://github.com/facebookresearch/Horizon/blob/master/docs/installation.md) to get going. Can anyone please help me on how to move forward?

opened by Jagdish007 13
Fix matrix inverse for joint LinUCB

Summary: This is a copy of D42322767 to fix the inverse of ill-conditioned matrix for joint LinUCB

Reviewed By: alexnikulkov

Differential Revision: D42334903
fb-exported cla signed

opened by alexnikulkov 1
Mask out non-present arm scores for Offline Eval

Summary: When some arms might be missing, apply a masked softmax to model scores during offline eval to avoid selecting the missing arms.

Differential Revision: D41990957

opened by alexnikulkov 0
SlateQ agent [Q & A]

Hi @kittipatv Do we have any resource to understand SlateQ algorithm implemented here; apart from original paper. I am exploring this for a recommendation problem but have some doubts around it.

opened by getsanjeevdubey 1
SlateQ agent implementation

Is next_state deliberate here in next_q_values calculation in slateQ agent - https://github.com/facebookresearch/ReAgent/blob/main/reagent/training/slate_q_trainer.py#L230

SlateQ agent implemented by SlateQ paper authors in recsim uses state instead of next state from replay buffer to get next_q_values - https://github.com/google-research/recsim/issues/26

opened by rahul-zomato 0

Owner

Facebook Research

GitHub https://reagent.ai

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

1.1k Dec 24, 2022

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

2.2k Jan 5, 2023

Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

1.5k Dec 30, 2022

A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.

Reinforcement Learning (PyTorch) ?? + ?? = ❤️ This repo will contain PyTorch implementation of various fundamental RL algorithms. It's aimed at making

123 Dec 23, 2022

Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli

Carousel Personalization in Music Streaming Apps with Contextual Bandits - RecSys 2020 This repository provides Python code and data to reproduce expe

48 Jan 2, 2023

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Related tags

Overview

Applied Reinforcement Learning @ Facebook

Overview

Algorithms Supported

Installation

Usage

License

Citing

Comments

TODOs:

Owner

Facebook Research

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.

A toolkit for developing and comparing reinforcement learning algorithms.

A toolkit for reproducible reinforcement learning research.

An open source robotics benchmark for meta- and multi-task reinforcement learning

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Tensorforce: a TensorFlow library for applied reinforcement learning

TensorFlow Reinforcement Learning

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Deep Reinforcement Learning for Keras.

ChainerRL is a deep reinforcement learning library built on top of Chainer.

Open world survival environment for reinforcement learning

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

A customisable 3D platform for agent-based AI research

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli