a Lightweight library for sequential learning agents, including reinforcement learning

Related tags

Deep Learning salina
Overview

SaLinA: SaLinA - A Flexible and Simple Library for Learning Sequential Agents (including Reinforcement Learning)

TL;DR

salina is a lightweight library extending PyTorch modules for developping sequential decision models. It can be used for Reinforcement Learning (including model-based with differentiable environments, multi-agent RL, ...), but also in a supervised/unsupervised learning settings (for instance for NLP, Computer Vision, etc..).

  • It allows to write very complex sequential models (or policies) in few lines
  • It works on multiple CPUs and GPUs

Quick Start

  • Just clone the repo

Documentation

For development, set up pre-commit hooks:

  • Run pip install pre-commit
    • or conda install -c conda-forge pre-commit
    • or brew install pre-commit
  • In the top directory of the repo, run pre-commit install to set up the git hook scripts
  • Now pre-commit will run automatically on git commit!
  • Currently isort, black and blacken-docs are used, in that order

Organization of the repo

Dependencies

salina is making use of pytorch, hydra for configuring experiments, and of gym for reinforcement learning algorithms.

Note on the Logger

We provide a simple Logger that logs in both tensorboard format, but also as pickle files that can be re-read to make tables and figures. See logger. This logger can be easily replaced by any other logger.

Description

Sequential Decision Making is much more than Reinforcement learning

  • Sequential Decision Making is about interactions:
  • Interaction with data (e.g attention-models, decision tree, cascade models, active sensing, active learning, recommendation, etc….)
  • Interaction with an environment (e.g games, control)
  • Interaction with humans (e.g recommender systems, dialog systems, health systems, …)
  • Interaction with a model of the world (e.g simulation)
  • Interaction between multiple entities (e.g multi-agent RL)

What salina is

  • A sandbox for developping sequential models at scale.

  • A small (300 hundred lines) 'core' code that defines everything you will use to implement agents involved in sequential decision learning systems.

    • It is easy to understand and to use since it keeps the main principles of pytorch, just extending nn.Module to Agent that handle tthe temporal dimension.

A set of agents that can be combined (like pytorch modules) to obtain complex behaviors

  • A set of references implementations and examples in different domains Reinforcement learning, Imitation Learning, Computer Vision, ... (more to come..)

What salina is not

  • Yet another reinforcement learning framework: salina is focused on sequential decision making in general. It can be used for RL (which is our main current use-case), but also for supervised learning, attention models, multi-agent learning, planning, control, cascade models, recommender systems,...
  • A library: salina is just a small layer on top of pytorch that encourages good practices for implementing sequential models. It thus very simple to understand and to use, but very powerful.

Citing salina

Please use this bibtex if you want to cite this repository in your publications:

Link to the paper: SaLinA: Sequential Learning of Agents

    @misc{salina,
        author = {Ludovic Denoyer, Alfredo de la Fuente, Song Duong, Jean-Baptiste Gaya, Pierre-Alexandre Kamienny, Daniel H. Thompson},
        title = {SaLinA: Sequential Learning of Agents},
        year = {2021},
        publisher = {Arxiv},
        howpublished = {\url{https://gitHub.com/facebookresearch/salina}},
    }

Papers using SaLinA:

  • Learning a subspace of policies for online adaptation in Reinforcement Learning. Jean-Baptiste Gaya, Laure Soulier, Ludovic Denoyer - Arxiv

License

salina is released under the MIT license. See LICENSE for additional details about it. See also our Terms of Use and Privacy Policy.

Comments
  • Variable workspace tensor sizes

    Variable workspace tensor sizes

    Currently, we are unable to do the following:

    from salina import Workspace
    ws = Workspace()
    batch_size = 5
    ws.set("obs", 0, torch.zeros(batch_size, 3))
    ws.set("obs", 1, torch.zeros(batch_size, 5))
    

    due to https://github.com/facebookresearch/salina/blob/10d09bb80f78e05ddd7de58e9e24ff0f302877fb/salina/workspace.py#L45

    Since tensors from sequential timesteps are stored as lists, I suspect variable-sized features should be possible. This would be immensely useful in multiagent systems as well as recurrent models (e.g. building/expanding a map as the agent explores).

    opened by smorad 8
  • Documentation and testing

    Documentation and testing

    This is a really interesting framework, I'd love to move some of our code to it. The workspace abstraction should greatly simplify our multiagent and recurrent models. I am a bit worried about the readability/correctness though -- are there any plans to write up proper documentation (e.g. https://readthedocs.org) and unit tests?

    opened by smorad 5
  • [xformers] blocksparse agent

    [xformers] blocksparse agent

    cc @ludc, not a lot of time but this should work. TODOs:

    • [x] handle dimensions not being powers of two (max episodes -> 1024 vs 1000). Either change the episodes or pad
    • [ ] (be smarter in how to choose in between sparse and blocksparse. If the time span is small enough, use sparse, else blocksparse)

    let me know what you think. Requires a TensorCore enabled GPU, should be faster than sparse for some regimes and fully fp16 aware

    CLA Signed 
    opened by blefaudeux 4
  • Rename **args to **kwargs

    Rename **args to **kwargs

    Fixes #16.

    There were a few more instances than I expected, but I think I got everything. I fixed a couple of other little issues at the same time which would have thrown errors.

    I also noticed that some of the files weren't black-formatted even though the README indicated to run isort + black through pre-commit. I ran pre-commit run --all-files to fix these formatting issues. There are several unused imports (and some unused variables) too, but I didn't want to go looking for all of these, so I left the ones I saw.

    CLA Signed 
    opened by neighthan 3
  • Requirements not complete and unable to run examples

    Requirements not complete and unable to run examples

    I just cloned the repo and tried to run some of the provided examples. Unfortunately I couldn't run them as the requirements are not clear. I tried python 3.7 - 3.9 to see if it was an issue of the python version. For each I did the following:

    • create clean anaconda environments (e.g. conda create -n salina-test python=3.7)
    • install torch and torchvision (conda install pytorch torchvision torchaudio cpuonly -c pytorch)
    • pip install -r requirements.txt
    • python setup.py install

    For individual examples I first get ModuleNotFoundErrors. E.g., when running salina_examples/rl/a2c/mono_cpu/main.py I encounter the following errors: ModuleNotFoundError: No module named 'graphviz' and ModuleNotFoundError: No module named 'pandas' After fixing those I get ModuleNotFoundError: No module named 'salina_examples.rl.a2c'

    Similarly, when trying to run salina_examples/rl/ppo_continuous/ppo.py I first encounter ModuleNotFoundError: No module named 'cv2' and after fixing it I get the following error:

    Traceback (most recent call last):
      File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/gym/envs/registration.py", line 158, in spec
        return self.env_specs[id]
    KeyError: 'Pendulum-v0'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "salina_examples/rl/ppo_continuous/ppo.py", line 180, in main
        action_agent = instantiate_class(cfg.action_agent)
      File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/salina-1.0-py3.8.egg/salina/__init__.py", line 27, in instantiate_class
        return c(**d)
      File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/salina-1.0-py3.8.egg/salina_examples/rl/ppo_continuous/agents.py", line 92, in __init__
        env = instantiate_class(args["env"])
      File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/salina-1.0-py3.8.egg/salina/__init__.py", line 27, in instantiate_class
        return c(**d)
      File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/salina-1.0-py3.8.egg/salina_examples/rl/ppo_continuous/agents.py", line 33, in make_gym_env
        e = gym.make(env_args["env_name"])
      File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/gym/envs/registration.py", line 235, in make
        return registry.make(id, **kwargs)
      File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/gym/envs/registration.py", line 128, in make
        spec = self.spec(path)
      File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/gym/envs/registration.py", line 185, in spec
        raise error.DeprecatedEnv(
    gym.error.DeprecatedEnv: Env Pendulum-v0 not found (valid versions include ['Pendulum-v1'])
    
    Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
    

    which is caused by a recent change in gym. For reference see:

    • https://github.com/openai/gym/pull/2423

    To me it looks like these are all dependency issues (and for the ModuleNotFoundError: No module named 'salina_examples.rl.a2c' it mightbe fixed by including an init.py in the folder). Could you please state the minimal version number for which salina works as well as the python version (more prominently than in the setup.py)?

    opened by AndreBiedenkapp 3
  • Where can I get the test configuration of the subspace_of_polices example?

    Where can I get the test configuration of the subspace_of_polices example?

    Hi,

    Thanks for sharing your interesting project! I recently have been interested in the project related to the paper, Learning a Subspace of Policies for Online Adaptation in RL. In the paper, they commented the code is released in this repository, and I tried to run the code, subspace_of_policies. I succeeded in running the training code, train.py, after slightly modifying it. Also, after relatively much modifying the evaluation code, I checked the code was working. However, I think to confirm the contents represented in the paper the evaluation configuration should be given, which has the components such as the torso, thig, shin, foot, gravity, and friction.

    I think the test_cfgs is needed. from salina_examples.rl.subspace_of_policies.envs import test_cfgs

    Could you share the code?

    Best regards,

    opened by aithlab 2
  • Chunking Recurrent States and Truncated BPTT

    Chunking Recurrent States and Truncated BPTT

    Hello,

    I'm interested with loading and storing recurrent states for training over longer episodes. This is generally called truncated back propagation through time (BPTT). For example, in the following case we break each trajectory into 80-timestep chunks:

        env = AutoResetGymAgent(
            make_cartpole,
            n_envs=2,
        )
        actor = Agents(
            LSTMAgent(hidden_size=32),
            QNetworkAgent(input_handle="state", num_actions=2),
            EpsilonGreedyActorAgent(epsilon=0.02),
        )
        collector = TemporalAgent(Agents(env, actor))
        
        ws = Workspace()
        for epoch in range(10):
            collector(ws, t=0, n_steps=80)
    

    Currently, if an episode is > 80 timesteps, it will receive a recurrent state of zeros. Does Salina provide a way to load the previous recurrent state?

    opened by smorad 2
  • Reward trajectory is one-off

    Reward trajectory is one-off

    Hey there, i find it somewhat counterintuitive that the framework uses a default reward at t=0 of 0 (see gyma.py line 279 & 292). Note that the gym interface only returns the initial state on reset (https://github.com/openai/gym/blob/103b7633f564a60062a25cc640ed1e189e99ddb7/gym/core.py#L8). Isn't it more common to assume that r_t = R(s_t, a_t) and consequently r_t is the outcome of \pi(s_t)? Currently, r_{t+1} is the outcome of \pi(s_t). In the A2C example this leads to some confusion where reward[1:] is the reward at t and critic[1:] the state value at t+1 (but both use a 1)

     target = reward[1:] + cfg.algorithm.discount_factor * critic[1:].detach() * (1 - done[1:].float())
    

    Best regards

    edit: Fig. 13 & Fig. 14 in the ArXiv Paper use set.get(...) , i believe it should be self.get(...) :-)

    opened by romue404 2
  • [feat] xformers agent

    [feat] xformers agent

    adding another Transformer-based agent, using xformers under the hood. For masks sparse enough (few enough time slices), this means that the computation will be naturally sparse, saving time and memory

    CLA Signed 
    opened by blefaudeux 1
  • Bug in NRemoteAgent?

    Bug in NRemoteAgent?

    NRemoteAgent's create method has the signature def create(agent, num_processes=0, time_size=None, **extra_args):. Should this be def create(self, agent, num_processes=0, time_size=None, **extra_args):? Right now, it seems the agent is making copies of itself to put in remote processes instead of making copies of an agent that's passed to create.

    https://github.com/facebookresearch/salina/blob/f231c77e44e87713d54984fa08ef4b38be47f644/salina/agents/remote.py#L193

    opened by neighthan 1
  • Rename **args to **kwargs

    Rename **args to **kwargs

    Some functions that accept arbitrary keyword arguments (e.g. Agents.__call__) name the keyword arguments args (e.g. __call__(self, workspace, **args)). What would you think of renaming this to __call__(self, workspace, **kwargs)? Even with the ** making it clear these were keyword arguments, I was confused for a bit thinking that these were positional arguments, since it's common to use f(*args, **kwargs) in Python. I'd be happy to submit a PR renaming any **args to **kwargs if you'll accept it; it shouldn't change anything for users since that name can only be used inside the functions. Otherwise, feel free to close this.

    opened by neighthan 1
  • Do you have any plan to be the flexible size for workspace?

    Do you have any plan to be the flexible size for workspace?

    Hi,

    I've recently been using your library for a reinforcement learning project. More precisely, I'm interested in multi-agent systems that provide multiple rewards for them. In my environment, after the one-step, it provides n_agents x 1 size of rewards. However, in the workspace, it initializes the reward as "torch.tensor([0.0]).float()" at the first time step(t=0). So, at the second time step, it could not set the new reward whose size is "n_agents x 1" because the occupied reward size is torch.Size([1]). I may think there are many cases that have multiple dimensions of reward not only in my environment but also in other environments.

    I carefully think it could be more useful if your library supports a flexible reward size.

    Best regards,

    opened by aithlab 2
  • What game other than CartPole-v0 is the A2C agent good at?

    What game other than CartPole-v0 is the A2C agent good at?

    Hi,

    I've been working with the a2c example agent you provide, and haven't found any game other than CartPole-v0 that it can learn well. Is there any other game that it is good at?

    Thank you so much.

    Best, Karin

    opened by wooloo1121 3
Owner
Facebook Research
Facebook Research
Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Box_Discretization_Network This repository is built on the pytorch [maskrcnn_benchmark]. The method is the foundation of our ReCTs-competition method

Yuliang Liu 266 Nov 24, 2022
A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

AMAZ3DSim AMAZ3DSim is a lightweight python-based 3D network multi-agent simulator. It uses a cell-based congestion model. It calculates risk, battery

Daniel Hirsch 13 Nov 4, 2022
Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

LibraNet This repository includes the official implementation of LibraNet for crowd counting, presented in our paper: Weighing Counts: Sequential Crow

Hao Lu 18 Nov 5, 2022
banditml is a lightweight contextual bandit & reinforcement learning library designed to be used in production Python services.

banditml is a lightweight contextual bandit & reinforcement learning library designed to be used in production Python services. This library is developed by Bandit ML and ex-authors of Facebook's applied reinforcement learning platform, Reagent.

Bandit ML 51 Dec 22, 2022
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

PyTorch RL Minimal Implementations There are implementations of some reinforcement learning algorithms, whose characteristics are as follow: Less pack

Gemini Light 4 Dec 31, 2022
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

null 730 Jan 9, 2023
sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

sequitur sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three differ

Jonathan Shobrook 305 Dec 21, 2022
Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet.

Ravens is a collection of simulated tasks in PyBullet for learning vision-based robotic manipulation, with emphasis on pick and place. It features a Gym-like API with 10 tabletop rearrangement tasks, each with (i) a scripted oracle that provides expert demonstrations (for imitation learning), and (ii) reward functions that provide partial credit (for reinforcement learning).

Google Research 367 Jan 9, 2023
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 7, 2022
Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

Manav Mishra 4 Apr 15, 2022
Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Lightweight (Bayesian) Media Mix Model This is not an official Google product. L

Google 342 Jan 3, 2023
Trading environnement for RL agents, backtesting and training.

TradzQAI Trading environnement for RL agents, backtesting and training. Live session with coinbasepro-python is finaly arrived ! Available sessions: L

Tony Denion 164 Oct 30, 2022
Lux AI environment interface for RLlib multi-agents

Lux AI interface to RLlib MultiAgentsEnv For Lux AI Season 1 Kaggle competition. LuxAI repo RLlib-multiagents docs Kaggle environments repo Please let

Jaime 12 Nov 7, 2022
PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

Saim Wani 4 May 8, 2022
A user-friendly research and development tool built to standardize RL competency assessment for custom agents and environments.

Built with ❤️ by Sam Showalter Contents Overview Installation Dependencies Usage Scripts Standard Execution Environment Development Environment Benchm

SRI-AIC 1 Nov 18, 2021
Pacman-AI - AI project designed by UC Berkeley. Designed reflex and minimax agents for the game Pacman.

Pacman AI Jussi Doherty CAP 4601 - Introduction to Artificial Intelligence - Fall 2020 Python version 3.0+ Source of this project This repo contains a

Jussi Doherty 1 Jan 3, 2022
Fake-user-agent-traffic-geneator - Python CLI Tool to generate fake traffic against URLs with configurable user-agents

Fake traffic generator for Gartner Demo Generate fake traffic to URLs with custo

New Relic Experimental 3 Oct 31, 2022