a Lightweight library for sequential learning agents, including reinforcement learning

Facebook Research

Last update: Dec 17, 2022

Related tags

Deep Learning salina

Overview

SaLinA: SaLinA - A Flexible and Simple Library for Learning Sequential Agents (including Reinforcement Learning)

TL;DR

salina is a lightweight library extending PyTorch modules for developping sequential decision models. It can be used for Reinforcement Learning (including model-based with differentiable environments, multi-agent RL, ...), but also in a supervised/unsupervised learning settings (for instance for NLP, Computer Vision, etc..).

It allows to write very complex sequential models (or policies) in few lines
It works on multiple CPUs and GPUs

Quick Start

Just clone the repo

Documentation

For development, set up pre-commit hooks:

Run pip install pre-commit
- or conda install -c conda-forge pre-commit
- or brew install pre-commit
In the top directory of the repo, run pre-commit install to set up the git hook scripts
Now pre-commit will run automatically on git commit!
Currently isort, black and blacken-docs are used, in that order

Organization of the repo

salina is the core library
- salina.agents is the catalog of agents (the same than torch.nn but for agents)
salina_examples provide many examples (in different domains)

Dependencies

salina is making use of pytorch, hydra for configuring experiments, and of gym for reinforcement learning algorithms.

Note on the Logger

We provide a simple Logger that logs in both tensorboard format, but also as pickle files that can be re-read to make tables and figures. See logger. This logger can be easily replaced by any other logger.

Description

Sequential Decision Making is much more than Reinforcement learning

Sequential Decision Making is about interactions:
Interaction with data (e.g attention-models, decision tree, cascade models, active sensing, active learning, recommendation, etc….)
Interaction with an environment (e.g games, control)
Interaction with humans (e.g recommender systems, dialog systems, health systems, …)
Interaction with a model of the world (e.g simulation)
Interaction between multiple entities (e.g multi-agent RL)

What `salina` is

A sandbox for developping sequential models at scale.
A small (300 hundred lines) 'core' code that defines everything you will use to implement agents involved in sequential decision learning systems.
- It is easy to understand and to use since it keeps the main principles of pytorch, just extending nn.Module to Agent that handle tthe temporal dimension.

A set of agents that can be combined (like pytorch modules) to obtain complex behaviors

A set of references implementations and examples in different domains Reinforcement learning, Imitation Learning, Computer Vision, ... (more to come..)

What `salina` is not

Yet another reinforcement learning framework: salina is focused on sequential decision making in general. It can be used for RL (which is our main current use-case), but also for supervised learning, attention models, multi-agent learning, planning, control, cascade models, recommender systems,...
A library: salina is just a small layer on top of pytorch that encourages good practices for implementing sequential models. It thus very simple to understand and to use, but very powerful.

Citing `salina`

Please use this bibtex if you want to cite this repository in your publications:

Link to the paper: SaLinA: Sequential Learning of Agents

    @misc{salina,
        author = {Ludovic Denoyer, Alfredo de la Fuente, Song Duong, Jean-Baptiste Gaya, Pierre-Alexandre Kamienny, Daniel H. Thompson},
        title = {SaLinA: Sequential Learning of Agents},
        year = {2021},
        publisher = {Arxiv},
        howpublished = {\url{https://gitHub.com/facebookresearch/salina}},
    }

Papers using SaLinA:

Learning a subspace of policies for online adaptation in Reinforcement Learning. Jean-Baptiste Gaya, Laure Soulier, Ludovic Denoyer - Arxiv

License

salina is released under the MIT license. See LICENSE for additional details about it. See also our Terms of Use and Privacy Policy.

Comments

Variable workspace tensor sizes
Currently, we are unable to do the following:

from salina import Workspace ws = Workspace() batch_size = 5 ws.set("obs", 0, torch.zeros(batch_size, 3)) ws.set("obs", 1, torch.zeros(batch_size, 5))

due to https://github.com/facebookresearch/salina/blob/10d09bb80f78e05ddd7de58e9e24ff0f302877fb/salina/workspace.py#L45

Since tensors from sequential timesteps are stored as lists, I suspect variable-sized features should be possible. This would be immensely useful in multiagent systems as well as recurrent models (e.g. building/expanding a map as the agent explores).
opened by smorad 8
Documentation and testing

This is a really interesting framework, I'd love to move some of our code to it. The workspace abstraction should greatly simplify our multiagent and recurrent models. I am a bit worried about the readability/correctness though -- are there any plans to write up proper documentation (e.g. https://readthedocs.org) and unit tests?

opened by smorad 5
[xformers] blocksparse agent
cc @ludc, not a lot of time but this should work. TODOs:

[x] handle dimensions not being powers of two (max episodes -> 1024 vs 1000). Either change the episodes or pad

[ ] (be smarter in how to choose in between sparse and blocksparse. If the time span is small enough, use sparse, else blocksparse)

let me know what you think. Requires a TensorCore enabled GPU, should be faster than sparse for some regimes and fully fp16 aware
CLA Signed
opened by blefaudeux 4
Rename **args to **kwargs

Fixes #16.

There were a few more instances than I expected, but I think I got everything. I fixed a couple of other little issues at the same time which would have thrown errors.

I also noticed that some of the files weren't black-formatted even though the README indicated to run isort + black through pre-commit. I ran pre-commit run --all-files to fix these formatting issues. There are several unused imports (and some unused variables) too, but I didn't want to go looking for all of these, so I left the ones I saw.
CLA Signed

opened by neighthan 3

Requirements not complete and unable to run examples

I just cloned the repo and tried to run some of the provided examples. Unfortunately I couldn't run them as the requirements are not clear. I tried python 3.7 - 3.9 to see if it was an issue of the python version. For each I did the following:

create clean anaconda environments (e.g. conda create -n salina-test python=3.7)
install torch and torchvision (conda install pytorch torchvision torchaudio cpuonly -c pytorch)
pip install -r requirements.txt
python setup.py install

For individual examples I first get ModuleNotFoundErrors. E.g., when running salina_examples/rl/a2c/mono_cpu/main.py I encounter the following errors: ModuleNotFoundError: No module named 'graphviz' and ModuleNotFoundError: No module named 'pandas' After fixing those I get ModuleNotFoundError: No module named 'salina_examples.rl.a2c'

Similarly, when trying to run salina_examples/rl/ppo_continuous/ppo.py I first encounter ModuleNotFoundError: No module named 'cv2' and after fixing it I get the following error:

Traceback (most recent call last):
  File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/gym/envs/registration.py", line 158, in spec
    return self.env_specs[id]
KeyError: 'Pendulum-v0'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "salina_examples/rl/ppo_continuous/ppo.py", line 180, in main
    action_agent = instantiate_class(cfg.action_agent)
  File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/salina-1.0-py3.8.egg/salina/__init__.py", line 27, in instantiate_class
    return c(**d)
  File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/salina-1.0-py3.8.egg/salina_examples/rl/ppo_continuous/agents.py", line 92, in __init__
    env = instantiate_class(args["env"])
  File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/salina-1.0-py3.8.egg/salina/__init__.py", line 27, in instantiate_class
    return c(**d)
  File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/salina-1.0-py3.8.egg/salina_examples/rl/ppo_continuous/agents.py", line 33, in make_gym_env
    e = gym.make(env_args["env_name"])
  File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/gym/envs/registration.py", line 235, in make
    return registry.make(id, **kwargs)
  File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/gym/envs/registration.py", line 128, in make
    spec = self.spec(path)
  File "/home/biedenka/anaconda3/envs/salina-test/lib/python3.8/site-packages/gym/envs/registration.py", line 185, in spec
    raise error.DeprecatedEnv(
gym.error.DeprecatedEnv: Env Pendulum-v0 not found (valid versions include ['Pendulum-v1'])

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

which is caused by a recent change in gym. For reference see:

https://github.com/openai/gym/pull/2423

To me it looks like these are all dependency issues (and for the ModuleNotFoundError: No module named 'salina_examples.rl.a2c' it mightbe fixed by including an init.py in the folder). Could you please state the minimal version number for which salina works as well as the python version (more prominently than in the setup.py)?

opened by AndreBiedenkapp 3

Where can I get the test configuration of the subspace_of_polices example?

Hi,

Thanks for sharing your interesting project! I recently have been interested in the project related to the paper, Learning a Subspace of Policies for Online Adaptation in RL. In the paper, they commented the code is released in this repository, and I tried to run the code, subspace_of_policies. I succeeded in running the training code, train.py, after slightly modifying it. Also, after relatively much modifying the evaluation code, I checked the code was working. However, I think to confirm the contents represented in the paper the evaluation configuration should be given, which has the components such as the torso, thig, shin, foot, gravity, and friction.

I think the test_cfgs is needed. from salina_examples.rl.subspace_of_policies.envs import test_cfgs

Could you share the code?

Best regards,

opened by aithlab 2
Chunking Recurrent States and Truncated BPTT
Hello,

I'm interested with loading and storing recurrent states for training over longer episodes. This is generally called truncated back propagation through time (BPTT). For example, in the following case we break each trajectory into 80-timestep chunks:

env = AutoResetGymAgent( make_cartpole, n_envs=2, ) actor = Agents( LSTMAgent(hidden_size=32), QNetworkAgent(input_handle="state", num_actions=2), EpsilonGreedyActorAgent(epsilon=0.02), ) collector = TemporalAgent(Agents(env, actor)) ws = Workspace() for epoch in range(10): collector(ws, t=0, n_steps=80)

Currently, if an episode is > 80 timesteps, it will receive a recurrent state of zeros. Does Salina provide a way to load the previous recurrent state?
opened by smorad 2
Reward trajectory is one-off
Hey there, i find it somewhat counterintuitive that the framework uses a default reward at t=0 of 0 (see gyma.py line 279 & 292). Note that the gym interface only returns the initial state on reset (https://github.com/openai/gym/blob/103b7633f564a60062a25cc640ed1e189e99ddb7/gym/core.py#L8). Isn't it more common to assume that r_t = R(s_t, a_t) and consequently r_t is the outcome of \pi(s_t)? Currently, r_{t+1} is the outcome of \pi(s_t). In the A2C example this leads to some confusion where reward[1:] is the reward at t and critic[1:] the state value at t+1 (but both use a 1)

target = reward[1:] + cfg.algorithm.discount_factor * critic[1:].detach() * (1 - done[1:].float())

Best regards

edit: Fig. 13 & Fig. 14 in the ArXiv Paper use set.get(...) , i believe it should be self.get(...) :-)
opened by romue404 2
[feat] xformers agent

adding another Transformer-based agent, using xformers under the hood. For masks sparse enough (few enough time slices), this means that the computation will be naturally sparse, saving time and memory
CLA Signed

opened by blefaudeux 1
Bug in NRemoteAgent?

NRemoteAgent's create method has the signature def create(agent, num_processes=0, time_size=None, **extra_args):. Should this be def create(self, agent, num_processes=0, time_size=None, **extra_args):? Right now, it seems the agent is making copies of itself to put in remote processes instead of making copies of an agent that's passed to create.

https://github.com/facebookresearch/salina/blob/f231c77e44e87713d54984fa08ef4b38be47f644/salina/agents/remote.py#L193

opened by neighthan 1
Rename **args to **kwargs

Some functions that accept arbitrary keyword arguments (e.g. Agents.__call__) name the keyword arguments args (e.g. __call__(self, workspace, **args)). What would you think of renaming this to __call__(self, workspace, **kwargs)? Even with the ** making it clear these were keyword arguments, I was confused for a bit thinking that these were positional arguments, since it's common to use f(*args, **kwargs) in Python. I'd be happy to submit a PR renaming any **args to **kwargs if you'll accept it; it shouldn't change anything for users since that name can only be used inside the functions. Otherwise, feel free to close this.

opened by neighthan 1
Do you have any plan to be the flexible size for workspace?

Hi,

I've recently been using your library for a reinforcement learning project. More precisely, I'm interested in multi-agent systems that provide multiple rewards for them. In my environment, after the one-step, it provides n_agents x 1 size of rewards. However, in the workspace, it initializes the reward as "torch.tensor([0.0]).float()" at the first time step(t=0). So, at the second time step, it could not set the new reward whose size is "n_agents x 1" because the occupied reward size is torch.Size([1]). I may think there are many cases that have multiple dimensions of reward not only in my environment but also in other environments.

I carefully think it could be more useful if your library supports a flexible reward size.

Best regards,

opened by aithlab 2
What game other than CartPole-v0 is the A2C agent good at?

Hi,

I've been working with the a2c example agent you provide, and haven't found any game other than CartPole-v0 that it can learn well. Is there any other game that it is good at?

Thank you so much.

Best, Karin

opened by wooloo1121 3

Owner

Facebook Research

GitHub

Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Box_Discretization_Network This repository is built on the pytorch [maskrcnn_benchmark]. The method is the foundation of our ReCTs-competition method

266 Nov 24, 2022

A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

AMAZ3DSim AMAZ3DSim is a lightweight python-based 3D network multi-agent simulator. It uses a cell-based congestion model. It calculates risk, battery

13 Nov 4, 2022

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

LibraNet This repository includes the official implementation of LibraNet for crowd counting, presented in our paper: Weighing Counts: Sequential Crow

18 Nov 5, 2022

banditml is a lightweight contextual bandit & reinforcement learning library designed to be used in production Python services.

banditml is a lightweight contextual bandit & reinforcement learning library designed to be used in production Python services. This library is developed by Bandit ML and ex-authors of Facebook's applied reinforcement learning platform, Reagent.

51 Dec 22, 2022

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

183 Dec 28, 2022

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

PyTorch RL Minimal Implementations There are implementations of some reinforcement learning algorithms, whose characteristics are as follow: Less pack

4 Dec 31, 2022

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

730 Jan 9, 2023

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

sequitur sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three differ

305 Dec 21, 2022

Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet.

Ravens is a collection of simulated tasks in PyBullet for learning vision-based robotic manipulation, with emphasis on pick and place. It features a Gym-like API with 10 tabletop rearrangement tasks, each with (i) a scripted oracle that provides expert demonstrations (for imitation learning), and (ii) reward functions that provide partial credit (for reinforcement learning).

367 Jan 9, 2023

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

48 Dec 30, 2022

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

8 Nov 7, 2022

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

4 Apr 15, 2022

Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Lightweight (Bayesian) Media Mix Model This is not an official Google product. L

342 Jan 3, 2023

Trading environnement for RL agents, backtesting and training.

TradzQAI Trading environnement for RL agents, backtesting and training. Live session with coinbasepro-python is finaly arrived ! Available sessions: L

164 Oct 30, 2022

Lux AI environment interface for RLlib multi-agents

Lux AI interface to RLlib MultiAgentsEnv For Lux AI Season 1 Kaggle competition. LuxAI repo RLlib-multiagents docs Kaggle environments repo Please let

12 Nov 7, 2022

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

4 May 8, 2022

A user-friendly research and development tool built to standardize RL competency assessment for custom agents and environments.

Built with ❤️ by Sam Showalter Contents Overview Installation Dependencies Usage Scripts Standard Execution Environment Development Environment Benchm

1 Nov 18, 2021

Pacman-AI - AI project designed by UC Berkeley. Designed reflex and minimax agents for the game Pacman.

Pacman AI Jussi Doherty CAP 4601 - Introduction to Artificial Intelligence - Fall 2020 Python version 3.0+ Source of this project This repo contains a

1 Jan 3, 2022

Fake-user-agent-traffic-geneator - Python CLI Tool to generate fake traffic against URLs with configurable user-agents

Fake traffic generator for Gartner Demo Generate fake traffic to URLs with custo

3 Oct 31, 2022

a Lightweight library for sequential learning agents, including reinforcement learning

Related tags

Overview

SaLinA: SaLinA - A Flexible and Simple Library for Learning Sequential Agents (including Reinforcement Learning)

TL;DR

Quick Start

Documentation

Organization of the repo

Dependencies

Note on the Logger

Description

What salina is

What salina is not

Citing salina

Papers using SaLinA:

License

Comments

Owner

Facebook Research

Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

banditml is a lightweight contextual bandit & reinforcement learning library designed to be used in production Python services.

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet.

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Trading environnement for RL agents, backtesting and training.

Lux AI environment interface for RLlib multi-agents

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

A user-friendly research and development tool built to standardize RL competency assessment for custom agents and environments.

Pacman-AI - AI project designed by UC Berkeley. Designed reflex and minimax agents for the game Pacman.

Fake-user-agent-traffic-geneator - Python CLI Tool to generate fake traffic against URLs with configurable user-agents

What `salina` is

What `salina` is not

Citing `salina`