Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

ContinualAI

Last update: Dec 24, 2022

Related tags

Deep Learning avalanche-rl

Overview

Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

Avalanche RL is a fork of ContinualAI's Pytorch-based framework Avalanche with the goal of extending its capabilities to Continual Reinforcement Learning (CRL), bootstrapping from the work done on Super/Unsupervised Continual Learning.

It should support all environments sharing the gym.Env interface, handle stream of experiences, provide strategies for RL algorithms and enable fast prototyping through an extremely flexible and customizable API.

The core structure and design principles of Avalanche are to remain untouched to easen the learning curve for all continual learning practitioners, so we still work with the same modules you can find in avl:

Benchmarks for managing data and stream of data.
Training for model training making use of extensible strategies.
Evaluation to evaluate the agent on consistent metrics.
Extras for general utils and building blocks.
Models contains commonly used model architectures.
Logging for logging metrics during training/evaluation.

Head over to Avalanche Website to learn more if these concepts sound unfamiliar to you!

Features

Features added so far in this fork can be summarized and grouped by module.

Benchmarks

RLScenario introduces a Benchmark for RL which augments each experience with an 'Environment' (defined through OpenAI gym.Env interface) effectively implementing a "stream of environments" with which the agent can interact to generate data and learn from that interaction during each experience. This concept models the way experiences in the supervised CL context are translated to CRL, moving away from the concept of Dataset toward a dynamic interaction through which data is generated.

RL Benchmark Generators allow to build these streams of experiences seamlessly, supporting:

Any sequence of gym.Env environments through gym_benchmark_generator, which returns a RLScenario from a list of environments ids (e.g. ["CartPole-v1", "MountainCar-v0", ..]) with access to a train and test stream just like in Avalanche. It also supports sampling a random number of environments if you wanna get wild with your experiments.
Atari 2600 games through atari_benchmark_generator, taking care of common Wrappers (e.g. frame stacking) for these environments to get you started even more quickly.
Habitat, more on this later.

Training

RLBaseStrategy is the super-class of all RL algorithms, augmenting BaseStrategy with RL specific callbacks while still making use of all major features such as plugins, logging and callbacks. Inspired by the amazing stable-baselines-3, it supports both on and off-policy algorithms under a common API defined as a 'rollouts phase' (data gathering) followed by an 'update phase', whose specifics are implemented by subclasses (RL algorithms).

Algorithms are added to the framework by subclassing RLBaseStrategy and implementing specific callbacks. You can check out this implementation of A2C in under 50 lines of actual code including the update step and the action sampling mechanism. Currently only A2C and DQN+DoubleDQN algorithms have been implemented, including various other "utils" such as Replay Buffer.

Training with multiple agent is supported through VectorizedEnv, leveraging Ray for parallel and potentially distributed execution of multiple environment interactions.

Evaluation

New metrics have been added to keep track of rewards, episodes length and any kind of scalar value (such as Epsilon Greedy 'eps') during experiments. Metrics are kept track of using a moving averaged window, useful for smoothing out fluctuations and recording standard deviation and max values reached.

Extras

Several common environment Wrappers are also kept here as we encourage the use of this pattern to suit environments output to your needs. We also provide common gym control environments which have been "parametrized" so you can tweak values such as force and gravity to help out in testing new ideas in a fast and reliable way on well known testbeds. These environments are available by pre-pending a C to the env id as in CCartPole-v1 as they're registered on first import.

Models

In this module you can find an implementation of both MLPs and CNNs for deep-q learning and actor-critic approaches, adapted from popular papers such as "Human-level Control Through Deep Reinforcement Learning" and "Overcoming catastrophic forgetting in neural networks" to learn directly from pixels or states.

Logging

A Tqdm-based interactive logger has been added to ease readability as well as sensible default loggers for RL algorithms.

Quick Example

import torch
from torch.optim import Adam
from avalanche.benchmarks.generators.rl_benchmark_generators import gym_benchmark_generator

from avalanche.models.actor_critic import ActorCriticMLP
from avalanche.training.strategies.reinforcement_learning import A2CStrategy

# Config
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Model
model = ActorCriticMLP(num_inputs=4, num_actions=2, actor_hidden_sizes=1024, critic_hidden_sizes=1024)

# CRL Benchmark Creation
scenario = gym_benchmark_generator(['CartPole-v1'], n_experiences=1, n_parallel_envs=1, 
    eval_envs=['CartPole-v1'])

# Prepare for training & testing
optimizer = Adam(model.parameters(), lr=1e-4)

# Reinforcement Learning strategy
strategy = A2CStrategy(model, optimizer, per_experience_steps=10000, max_steps_per_rollout=5, 
    device=device, eval_every=1000, eval_episodes=10)

# train and test loop
results = []
for experience in scenario.train_stream:
    strategy.train(experience)
    results.append(strategy.eval(scenario.test_stream))

Compare it with vanilla Avalanche snippet!

Check out more examples here (advanced ones coming soon) or in unit tests. We also got a small-scale reproduction of the original EWC paper (Deepmind) experiments.

Installation

As this fork is still under development, the advised way to install it is to simply clone this repo git clone https://github.com/NickLucche/avalanche.git and then just follow avalanche guide to install as developer. Spoiler, just run conda env update --file environment-dev.yml to update your current environment with avalanche-rl dependencies. Currently, the only added dependency is ray.

Disclaimer

This fork is under strict development so expect changes on the main branch on a fairly regular basis. As Avalanche itself it's still in its early Alpha versions, it's only fair to say that Avalanche RL is in super-duper pre-Alpha.

We believe there's lots of room for improvements and tweaking but at the same time there's much that can be offered to the growing community of continual learning practitioners approaching reinforcement learning by allowing to perform experiments under a common framework with a well-defined structure.

Comments

Add citation to Avalanche-RL paper

HI Niccolò! Please add a ref in the main readme about our recent pub on Avalanche-RL :) You can get inspired on how we did it in the main avalanche repo!

opened by vlomonaco 1
FIX CI

we should fix the CI on this repo.

Also, if the examples reproduce some papers, it would be nice to state the expected results so that we can check reproducibility with new commits, like we want to do with the reproducible-cl repo.
bug

opened by AntonioCarta 0
Support new Avalanche version

This new version (https://github.com/ContinualAI/avalanche/commit/4380ebc1b87b6034120c915249cde660e6d432e4) includes a refactoring of the RLScenario and RLExperience using the newly defined benchmark stack components.

In order to support it, we must also modify our RLStrategy to inherit from the "BaseTemplate" (https://github.com/ContinualAI/avalanche/tree/master/avalanche/training/templates), this is the newest "hierarchy in town" that just joined the Avalanche training module.

This integration should represent a significant step toward an eventual merge of Avalanche RL into Avalanche.
enhancement

opened by NickLucche 3
Moving benchmarks to avalanche

I think Avalanche is ready to integrate the CRL benchmarks.

@NickLucche do you agree with moving the benchmarks in avalanche? Once they are in the main repository, they would be considered "stable" and possibly receive more attention. There are a lot of people using Avalanche just for the benchmarks that may not know about avalanche-rl.
enhancement

opened by AntonioCarta 6
Restructure file location

Current module structure is borrowed from avalanche but file location for rl utils isn't super-obvious. I believe we should move stuff around in order to make it more predictable for the user, probably with a less-nested and simpler structure.
enhancement

opened by NickLucche 0
Draft: Adding stable baselines PPO agent
Here is a running example of stable baselines 3 PPO agent. It requires more custom strategy class, so this does not inherit from the RLBaseStrategy in the avalanche-rl repo.

Main changes:

Add choose_actions(Observations) -> Actions to be called in both training & evaluation to get actions for specific observations. This replaces RLBaseStrategy.sample_actions(), which was only used during training

Add receive_transitions(Vectorized[Transitions]) to be called during rollout with the result of calling env.step(actions). This replaces the Rollout object that was previously returned from rollout. This is also where strategies can put training logic.

NOTE: A Transition is a tuple of (observation, action, reward, done, next observation)

Rework rollout() to enable usage in both training_exp() and eval_exp(). It now generates episode data rather than returning rollouts.

Use Timestep as experience limits and periodic eval frequencies.

Things I'm unsure about:

PPO manages the pytorch Module, optimizer & criterion itself, so what should I pass to the BaseStrategy? Will there be support for Strategies that manage those themselves rather than passing to BaseStrategy?

PPO requires knowing both number of environments (for vectorization) and observation/action spaces up front. I required them in the constructor, but that limits what scenarios this can be used in.

My current thinking for way ahead is:

Extract most of the functionality of this example into either a reworked RLBaseStrategy, or a new base strategy class

choose_actions() and receive_transitions() would be abstract methods of the new base class

Move the PPO specific things to a subclass of the new base strategy. This would only implement choose_actions() and receive_transitions().
opened by coreylowman 8

Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

Related tags

Overview

Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

Features

Benchmarks

Training

Evaluation

Extras

Models

Logging

Quick Example

Installation

Disclaimer

Comments

Add citation to Avalanche-RL paper

FIX CI

Support new Avalanche version

Moving benchmarks to avalanche

Restructure file location

Draft: Adding stable baselines PPO agent

Owner

ContinualAI

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

CL-Gym: Full-Featured PyTorch Library for Continual Learning

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

An end-to-end machine learning library to directly optimize AUC loss

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

Pytorch library for end-to-end transformer models training and serving

A PyTorch library and evaluation platform for end-to-end compression research

PyTorch implementation of: Michieli U. and Zanuttigh P., "Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations", CVPR 2021.

CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper)

ICSS - Interactive Continual Semantic Segmentation

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)