A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Overview

Banner

Applied Reinforcement Learning @ Facebook

License CircleCI codecov

Overview

ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook. ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training, counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent see the white paper here.

The platform was once named "Horizon" but we have adopted the name "ReAgent" recently to emphasize its broader scope in decision making and reasoning.

Algorithms Supported

Installation

ReAgent can be installed via. Docker or manually. Detailed instructions on how to install ReAgent can be found here.

Usage

Detailed instructions on how to use ReAgent Models can be found here.

The ReAgent Serving Platform (RASP) tutorial is available here.

License

ReAgent is released under a BSD 3-Clause license. Find out more about it here.

Citing

@article{gauci2018horizon, title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform}, author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui}, journal={arXiv preprint arXiv:1811.00260}, year={2018} }

Comments
  • Upgrade ReAgent to use Python 3.8

    Upgrade ReAgent to use Python 3.8

    Summary: Currently, we have some test failures (https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/1460/workflows/ecc21254-779b-4a89-a40d-ea317e839d96/jobs/8655) because we miss some latest features.

    Differential Revision: D26977836

    Merged fb-exported cla signed 
    opened by czxttkl 38
  • Reimplement MDNRNN using new gym.

    Reimplement MDNRNN using new gym.

    Using our new gym, test MDNRNN feature importance/sensitivity. Also, train DQN to play POMDP string game with states embedded with MDNRNN. This is in preparation to nuke old gym folder.

    Merged 
    opened by kaiwenw 23
  • Lightning SACTrainer

    Lightning SACTrainer

    Summary:

    • Created ReAgentLightningModule as base class to implement genrator API
    • Implemented reporting for SAC

    TODOs:

    • Convert TD3 to LightningModule
    • Fix the OSS version of model manager
    • Fix on-policy training with Gym (by creating GymDataModule)

    Differential Revision: D23857511

    Merged fb-exported cla signed 
    opened by kittipatv 22
  • add env flag to skip frozen registry check

    add env flag to skip frozen registry check

    Summary: Environment variable SKIP_FROZEN_REGISTRY_CHECK is checked. If it's !=0, we print a warning instead of raising an error when we attempt to add members to a frozen regitry.

    Differential Revision: D32773682

    fb-exported cla signed 
    opened by alexnikulkov 17
  • add async_run_episode to gymrunner to support envs with async step methods

    add async_run_episode to gymrunner to support envs with async step methods

    Summary: I need this because my reward evaluation is done by an async coroutine (multiple trajectories are being generated in parallel)

    Differential Revision: D25487664

    Merged fb-exported cla signed Reverted 
    opened by alexnikulkov 16
  • Migrate REINFORCE trainer to Lightning

    Migrate REINFORCE trainer to Lightning

    Summary: I migrated REINFORCE trainer to Lightning. I don't like the fake optimizer trick and I'll look into doing it more cleanly.

    Differential Revision: D26246712

    Merged fb-exported cla signed 
    opened by alexnikulkov 14
  • Extend Gymrunner, add Transition and Trajectory

    Extend Gymrunner, add Transition and Trajectory

    Summary: Gymrunner is currently limited, which results in writing duplicated code when we're trying to replicate the previous gym environment's behavior, such as adding mdp_id, sequence_number to RB, or evaluating with gamma < 1.0. This change makes it easier to make those changes and without code dup.

    Differential Revision: D21616090

    Merged fb-exported 
    opened by kaiwenw 14
  • OOM killed

    OOM killed

    Hi,

    I played dqn_workflow with 7.9G training_data. But i got a OOM Killed. Below is my environment and oom logs.

    workflow : dqn_workflow.py training_data : 8 features, 20,249,257 rows, 7.9G training_eval_data : 8 features, 2,028,916 rows, 0.8G RAM : 80G

    INFO:ml.rl.evaluation.evaluation_data_page:EvaluationDataPage minibatch size: 2028912
    WARNING:ml.rl.evaluation.doubly_robust_estimator:Can't normalize DR-CPE because of small or negative logged_policy_score
    Killed
    
    [Tue May  7 22:05:38 2019] python invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
    [Tue May  7 22:05:38 2019] python cpuset=42ee6ef8b84594988960735ef211ac05221059efc2d524f2afc1e2b49eb46d0c mems_allowed=0-1
    [Tue May  7 22:05:38 2019] CPU: 1 PID: 51997 Comm: python Tainted: P           O      4.20.13-1.el7.elrepo.x86_64 #1
    [Tue May  7 22:05:38 2019] Hardware name: Dell Inc. PowerEdge C4140/013M88, BIOS 1.6.11 11/21/2018
    [Tue May  7 22:05:38 2019] Call Trace:
    [Tue May  7 22:05:38 2019]  dump_stack+0x63/0x88
    [Tue May  7 22:05:38 2019]  dump_header+0x78/0x2a4
    [Tue May  7 22:05:38 2019]  ? mem_cgroup_scan_tasks+0x9c/0xf0
    [Tue May  7 22:05:38 2019]  oom_kill_process+0x26b/0x290
    [Tue May  7 22:05:38 2019]  out_of_memory+0x140/0x4b0
    [Tue May  7 22:05:38 2019]  mem_cgroup_out_of_memory+0x4b/0x80
    [Tue May  7 22:05:38 2019]  try_charge+0x6e2/0x750
    [Tue May  7 22:05:38 2019]  mem_cgroup_try_charge+0x8c/0x1e0
    [Tue May  7 22:05:38 2019]  __add_to_page_cache_locked+0x1a0/0x300
    [Tue May  7 22:05:38 2019]  ? scan_shadow_nodes+0x30/0x30
    [Tue May  7 22:05:38 2019]  add_to_page_cache_lru+0x4e/0xd0
    [Tue May  7 22:05:38 2019]  filemap_fault+0x428/0x7c0
    [Tue May  7 22:05:38 2019]  ? xas_find+0x138/0x1a0
    [Tue May  7 22:05:38 2019]  ? filemap_map_pages+0x153/0x3c0
    [Tue May  7 22:05:38 2019]  __do_fault+0x3e/0xc0
    [Tue May  7 22:05:38 2019]  __handle_mm_fault+0xbd6/0xe80
    [Tue May  7 22:05:38 2019]  handle_mm_fault+0x102/0x220
    [Tue May  7 22:05:38 2019]  __do_page_fault+0x21c/0x4c0
    [Tue May  7 22:05:38 2019]  do_page_fault+0x37/0x140
    [Tue May  7 22:05:38 2019]  ? page_fault+0x8/0x30
    [Tue May  7 22:05:38 2019]  page_fault+0x1e/0x30
    ...
    [Tue May  7 22:05:38 2019] Memory cgroup out of memory: Kill process 51997 (python) score 997 or sacrifice child
    [Tue May  7 22:05:38 2019] Killed process 51997 (python) total-vm:102757536kB, anon-rss:83335008kB, file-rss:132692kB, shmem-rss:8192kB
    [Tue May  7 22:05:42 2019] oom_reaper: reaped process 51997 (python), now anon-rss:0kB, file-rss:127188kB, shmem-rss:8192kB
    

    image green : CPU yellow : RAM

    opened by pjy953 14
  • Add sparse features to reward decomposition

    Add sparse features to reward decomposition

    Summary: As a showcase for how to add sparse features to ReAgent

    I take reference to mainly two examples:

    1. fbsource/fbcode/minimal_viable_ai/ifr_uv/train.py
    2. fbsource/fbcode/torchrecipes/rec/dlrm_main_fb.py

    Differential Revision: D33225789

    fb-exported cla signed 
    opened by czxttkl 13
  • Cannot build RaspCli inside the cpu Dockerfile

    Cannot build RaspCli inside the cpu Dockerfile

    I followed the installation guide for Docker cpu and can successfully go inside the container and run the tests.

    The tutorial has a part that says start an RP server. I can't do this because I don't have ./serving/build/RaspCli.

    To build RaspCli I ran the following commands inside ./serving/build:

    wget https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.3.0%2Bcpu.zip -O libtorch.zip && \
        unzip libtorch.zip && \
        rm libtorch.zip
    
    conda install glog
    
    apt -y install libgflags-dev \
                       libgoogle-glog-dev \
                       libboost-tools-dev \
                       libboost-thread1.62-dev
    
    cmake -DCMAKE_PREFIX_PATH=$HOME/libtorch ..
    make
    

    The build fails with many multiple definitions error. Here is an excerpt:

    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_large_model':
    (.text+0x11c): multiple definition of `__morestack_large_model'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x11c): first defined here
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__stack_split_initialize':
    (.text+0x12c): multiple definition of `__stack_split_initialize'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x12c): first defined here
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_get_guard':
    (.text+0x155): multiple definition of `__morestack_get_guard'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x155): first defined here
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_set_guard':
    (.text+0x15f): multiple definition of `__morestack_set_guard'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x15f): first defined here
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_make_guard':
    (.text+0x169): multiple definition of `__morestack_make_guard'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x169): first defined here
    collect2: error: ld returned 1 exit status
    CMakeFiles/RaspCli.dir/build.make:106: recipe for target 'RaspCli' failed
    make[2]: *** [RaspCli] Error 1
    CMakeFiles/Makefile2:110: recipe for target 'CMakeFiles/RaspCli.dir/all' failed
    make[1]: *** [CMakeFiles/RaspCli.dir/all] Error 2
    Makefile:140: recipe for target 'all' failed
    make: *** [all] Error 2
    

    How can I successfully build ReAgent inside the CPU Docker container?

    I am running Ubuntu 18.04.3 LTS.

    opened by hsm207 13
  • Getting a error running spark-submit job

    Getting a error running spark-submit job

    Hello,

    I am trying to follow the instructions here: https://github.com/facebookresearch/Horizon/blob/master/docs/usage.md

    When I run this script: /usr/local/spark/bin/spark-submit
    --class com.facebook.spark.rl.Preprocessor preprocessing/target/rl-preprocessing-1.1.jar
    "cat ml/rl/workflow/sample_configs/discrete_action/timeline.json"

    I am getting2019-02-27 00:57:03 INFO HiveMetaStore:746 - 0: get_database: global_temp 2019-02-27 00:57:03 INFO audit:371 - ugi=root ip=unknown-ip-addr cmd=get_database: global_temp 2019-02-27 00:57:03 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException Exception in thread "main" org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and 'source_table.mdp_id' is not an aggregate function. Wrap '()' in windowing function(s) or wrap 'source_table.mdp_id' in first() (or first_value) if you don't care which value you get.;; 'Sort ['HASH('mdp_id, 'sequence_number) ASC NULLS FIRST], false +- 'RepartitionByExpression ['HASH('mdp_id, 'sequence_number)], 200 +- 'Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, next_state_features#24, next_action#25, sequence_number#2, sequence_number_ordinal#26, time_diff#27, possible_actions#7, possible_next_actions#28, metrics#8] +- 'Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8, next_state_features#24, next_action#25, sequence_number_ordinal#26, _we3#30, possible_next_actions#28, next_state_features#24, next_action#25, sequence_number_ordinal#26, (coalesce(_we3#30, sequence_number#2) - sequence_number#2) AS time_diff#27, possible_next_actions#28] +- 'Window [lead(state_features#4, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS next_state_features#24, lead(action#5, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS next_action#25, row_number() windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS sequence_number_ordinal#26, lead(sequence_number#2, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS _we3#30, lead(possible_actions#7, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS possible_next_actions#28], [mdp_id#1], [mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST] +- 'Filter isnotnull('next_state_features) +- Aggregate [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8] +- SubqueryAlias source_table +- Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8] +- Filter ((ds#0 >= 2019-01-01) && (ds#0 <= 2019-01-01)) +- SubqueryAlias cartpole_discrete +- Relation[ds#0,mdp_id#1,sequence_number#2,action_probability#3,state_features#4,action#5,reward#6,possible_actions#7,metrics#8] json

    I tried the steps, after manually installing Hbase (This step is missing in the documentation. Please let me know, if you want me to add it)

    I am using docker on Mac instructions (https://github.com/facebookresearch/Horizon/blob/master/docs/installation.md) to get going. Can anyone please help me on how to move forward?

    opened by Jagdish007 13
  • Fix matrix inverse for joint LinUCB

    Fix matrix inverse for joint LinUCB

    Summary: This is a copy of D42322767 to fix the inverse of ill-conditioned matrix for joint LinUCB

    Reviewed By: alexnikulkov

    Differential Revision: D42334903

    fb-exported cla signed 
    opened by alexnikulkov 1
  • Mask out non-present arm scores for Offline Eval

    Mask out non-present arm scores for Offline Eval

    Summary: When some arms might be missing, apply a masked softmax to model scores during offline eval to avoid selecting the missing arms.

    Differential Revision: D41990957

    opened by alexnikulkov 0
  • SlateQ agent [Q & A]

    SlateQ agent [Q & A]

    Hi @kittipatv Do we have any resource to understand SlateQ algorithm implemented here; apart from original paper. I am exploring this for a recommendation problem but have some doubts around it.

    opened by getsanjeevdubey 1
  • SlateQ agent implementation

    SlateQ agent implementation

    Is next_state deliberate here in next_q_values calculation in slateQ agent - https://github.com/facebookresearch/ReAgent/blob/main/reagent/training/slate_q_trainer.py#L230

    SlateQ agent implemented by SlateQ paper authors in recsim uses state instead of next state from replay buffer to get next_q_values - https://github.com/google-research/recsim/issues/26

    opened by rahul-zomato 0
Owner
Facebook Research
Facebook Research
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Wah Loon Keng 1.1k Dec 24, 2022
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Intel Labs 2.2k Jan 5, 2023
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Marek Wydmuch 1.5k Dec 30, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 1, 2023
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Reinforcement Learning Working Group 1.6k Jan 9, 2023
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Reinforcement Learning Working Group 823 Jan 6, 2023
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

OpenAI 13.5k Jan 7, 2023
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Ashley Hill 3.7k Jan 1, 2023
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 2, 2023
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

DeepMind 3.1k Dec 29, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 7, 2023
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 5.4k Jan 4, 2023
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Chainer 1.1k Dec 26, 2022
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Danijar Hafner 213 Jan 5, 2023
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

null 404 Dec 25, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

AgentMaker 117 Dec 12, 2022
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

DeepMind 6.8k Jan 5, 2023
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

null 2.4k Dec 29, 2022
Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli

Carousel Personalization in Music Streaming Apps with Contextual Bandits - RecSys 2020 This repository provides Python code and data to reproduce expe

Deezer 48 Jan 2, 2023