The Empirical Investigation of Representation Learning for Imitation (EIRLI)

Center for Human-Compatible AI

Last update: Nov 6, 2022

Related tags

Deep Learning eirli

Overview

The Empirical Investigation of Representation Learning for Imitation (EIRLI)

Over the past handful of years, representation learning has exploded as a subfield, and, with it have come a plethora of new methods, each slightly different from the other.

Our Empirical Investigation of Representation Learning for Imitation (EIRLI) has two main goals:

To create a modular algorithm definition system that allows researchers to easily pick and choose from a wide array of commonly used design axes
To facilitate testing of representations within the context of sequential learning, particularly imitation learning and offline reinforcement learning

Common Use Cases

Do you want to…

Reproduce our results? You can find scripts and instructions here to help reproduce our benchmark results.
Design and experiment with a new representation learning algorithm using our modular components? You can find documentation on that here
Use our algorithm definitions in a setting other than sequential learning? The base example here demonstrates this simplified use case

Otherwise, you can see our full ReadTheDocs documentation here.

Modular Algorithm Design

This library was designed in a way that breaks down the definition of a representation learning algorithm into several key parts. The intention was that this system be flexible enough many commonly used algorithms can be defined through different combinations of these modular components.

The design relies on the central concept of a "context" and a "target". In very rough terms, all of our algorithms work by applying some transformation to the context, some transformation to the target, and then calculating a loss as a function of those two transformations. Sometimes an extra context object is passed in

Some examples are:

In SimCLR, the context and target are the same image frame, and augmentation and then encoding is applied to both context and target. That learned representation is sent through a decoder, and then the context and target representations are pulled together with a contrastive loss.
In TemporalCPC, the context is a frame at time t, and the target a frame at time t+k, and then, similarly to SimCLR above, augmentation is applied to the frame before it's put through an encoder, and the two resulting representations pulled together
In a Variational Autoencoder, the context and target are the same image frame. An bottleneck encoder and then a reconstructive decoder are applied to the context, and this reconstructed context is compared to the target through a L2 pixel loss
A Dynamics Prediction model can be seen as an conceptual combination of an autoencoder (which tries to predict the current full image frame) and TemporalCPC, which predicts future information based on current information. In the case of a Dynamics model, we predict a future frame (the target) given the current frame (context) and an action as extra context.

This abstraction isn't perfect, but we believe it is coherent enough to allow for a good number of shared mechanisms between algorithms, and flexible enough to support a wide variety of them.

The modular design mentioned above is facilitated through the use of a number of class interfaces, each of which handles a different component of the algorithm. By selecting different implementations of these shared interfaces, and creating a RepresentationLearner that takes them as arguments, and handles the base machinery of performing transformations.

TargetPairConstructer - This component takes in a set of trajectories (assumed to be iterators of dicts containing 'obs' and optional 'acts', and 'dones' keys) and creates a dataset of (context, target, optional extra context) pairs that will be shuffled to form the training set.
Augmenter - This component governs whether either or both of the context and target objects are augmented before being passed to the encoder. Note that this concept only meaningfully applies when the object being augmented is an image frame.
Encoder - The encoder is responsible for taking in an image frame and producing a learned vector representation. It is optionally chained with a Decoder to produce the input to the loss function (which may be a reconstructed image in the case of VAE or Dynamics, or may be a projected version of the learned representation in the case of contrastive methods like SimCLR that use a projection head)
Decoder - As mentioned above, the Decoder acts as a bridge between the representation in the form you want to use for transfer, and whatever input is required your loss function, which is often some transformation of that canonical representation.
BatchExtender - This component is used for situations where you want to calculate loss on batch elements that are not part of the batch that went through your encoder and decoder on this step. This is centrally used for contrastive methods that use momentum, since in that case, you want to use elements from a cached store of previously-calculated representations as negatives in your contrastive loss
LossCalculator - This component takes in the transformed context and transformed target and handles the loss calculation, along with any transformations that need to happen as a part of that calculation.

Training Scripts

In addition to machinery for constructing algorithms, the repo contains a set of Sacred-based training scripts for testing different Representation Learning algorithms as either pretraining or joint training components within an imitation learning pipeline. These are likeliest to be a fit for your use case if you want to reproduce our results, or train models in similar settings

Comments

Bridging pretrain and adaptation
This PR aims for:

Create a Python script that calls run_rep_learner.py first then il_train.py. This file should allow the normal workflow of pretrain -> adapt, and also grid search over different configurations.

Modify relevant files. e.g. Saving and loading models
opened by RPC2 11
Initial Draft of Hypothesis 3 Experiment Code
This is a draft PR sketching out the code to implement the tests in Hypothesis 3 of our experiment spreadsheet.

It attempts to implement:

For each benchmark: --> For each environment: -----> For each dataset structure:

if MAGICAL:

train k seeds of RepL for each algo, then subsequently k*j seeds of IL on those pretrained RepL

if DMC:

train k*j seeds of RepL+IL without reuse
opened by decodyng 6
Support for multi-task/multi-data-source repL
Changes the repL data-loading pipeline to use a new, uniform on-disk format. This should eventually allow us to do multi-task training; to train on imitation policy rollouts; to train on random rollouts; etc.

Things that this PR does, or will do:

Introduces a new demonstration format based on webdataset.

Refactors the repl experiment to exclusively read demonstrations in this format.

Creates mkdataset_demos.py script that can convert the "legacy" data formats into the new unified format.

Creates mkdataset_random.py to generating random rollouts in the new unified format.

Splits the benchmark Sacred ingredient into three parts: env, venv_opts, and env_data. The first contains all (and only) the options necessary to construct environments. The second contains additional options necessary to configure vec envs. The last contains hardcoded paths to data of different types.

Adds config options to the repl experiment that allows it to use multiple types of data, rather than just single-task demos.

This PR does not make il_train.py use the new data format, although in principle that should be possible (just unnecessary, since we're not doing multi-task training).

Also, a warning: I had to rewrite the target pair constructors and parts of the repL training loop to work with the new dataset format. There might be some bugs related to those things which our tests have not caught. If reviewing this, pay special attention to those parts.
opened by qxcv 5
Revive GAIL
This PR makes GAIL work again, and makes a few related changes too:

Updates imitation/SB3 deps (this means people will need to reinstall them from requirements.txt; should ping Slack when this is merged).

Makes MineCraft tests skippable if minerl is not installed. This approach would be easy to extend to make dm_control skippable if MuJoCo is not installed, make MAGICAL skippable if no X server is running, etc.

Solves MoveToCorner

There are also some unrelated things it does (because I wanted to use the fixes above without making a new PR):

Slightly adjusts VAE hyperparams so that VAE actually works

Adds some experiment configs that I was using to test GAIL

Adds data generation scripts for ICML

Fixes some issues with the skopt tuning code which needed to be fixed in order to tune GAIL

Things it doesn't do:

Get DMC environments to reach adequate reward. This needs a bit more hyperparameter tuning.
opened by qxcv 4
Minimal Viable Version of Dataset-Level Augmentation

This is a minimal change to allow augmentation to happen on the dataset rather than the batch level. This is probably going to be provisional (either due to changes in Sam's PR, or due to changes to use Kornia), so I expect this to change, but wanted to make a MVP version of this functionality exist earlier for CIFAR testing use.

opened by decodyng 4
Fixes to Repl Reuse Functionality
This PR:

Adds logic to only create a new repl seed if one hasn't already been set. This makes it possible to set the seed explicitly in the config (in which case it will be part of the encoder-lookup hash), or else leave it empty (in which case it will not be part of the hash, and will be randomly generated)

More rigorously recursively sorts config dictionaries before jsonpickle-ing and hashing them

Adds logic for an is_multitask flag, and ensures that if a RepL task is set as multitask, its env_cfg.task_name isn't included in its config hash

I have not yet confirmed the consistency of hashes across OSes, but plan to do so tomorrow morning, at which point I'll move this PR out of draft status, but since we're under time constraints, it seemed good to put what had been done up for review now.

[Note: I spent quite awhile trying to implement a test that would check the env_cfg multitask functionality, but ran into difficulties because Magical is the setting where we'd be planning to do Multitask, since DMC has different action sizes, and Magical still errors out on my machine due to the NSFWindow bug]
opened by decodyng 3
Initial implementation of contrastive inverse dynamics

Copied from the docs:

Like InverseDynamicsPrediction, except it uses a contrastive loss function.

During the decoder stage, instead of predicting an action, we simply concatenate the representations of s and s' together. During the encoder stage, we need to also encode the actions into a vector of size 2 * representation_dim, so that they can be contrasted against the concatenation of encodings of s and s'.

opened by rohinmshah 3
BC script: Add LR scheduler option

Pointing imitation requirement to https://github.com/HumanCompatibleAI/imitation/pull/259 's branch for now. Will update requirements to new commit once https://github.com/HumanCompatibleAI/imitation/pull/259 is merged into imitation's ILR dev branch.

opened by shwang 3
Refactor Standard Deviation Calculation to Avoid Explosion/Test Flakiness
This PR:

Refactors the decoder projection head to learn a standard deviation (when it does learn one) as a function of the encoder-predicted mean, not the encoder-predicted standard deviation. This avoids the "NN on top of an exponential" effect that we believe to have been the cause of exploding values

Changes the stochastic parameter of the encoder to learn_scale, so that the decoder and encoder have consistent parameters, which both correspond to "should I learn a standard deviation". In the encoder case, not learning one results in returning a constant. In the decoder case, not learning one results in returning whatever standard deviation was returned by the encoder.
opened by decodyng 3
Per-Env Hyperparameter Tuning
This PR has the goal of writing hyperparameter tuning scripts that can be run per-environment, to try to find optimal hyperparameters on a per-env, per-algorithm basis. This involved:

Moving hyperparameter tuning logic to live in its own file, separate from pretrain_n_adapt

Defining search spaces for some new algorithms

Writing a bash script to run everything together
opened by decodyng 2
Misc Changes for CEB Testing
This PR contains a number of small changes implemented in the process of running CEB tests.

Creates a MultiLogger object, that is a lightweight wrapper around both a logger and a Torch SummaryWriter, and which can be more easily passed into objects to allow them to add to logs. I've currently only added to augmenter and loss constructor

Adds a TargetProjection decoder which only adds a projection layer to the target, and not to the context, to allow for symmetry-breaking between the two, and heuristic approximation of learnt bilinear loss

Allows for reading in expert trajectories [Now deprecated, given Sam's changes to Master]

Cleans up Sacred parameters such that it's easier to add ad-hoc parameters without needing to use --force (which had been necessary if you added a parameter to the algo hyperparameter dictionary)

Moves some kwarg-handling logic from __init__ into RepresentationLearner
opened by decodyng 2
Imitation Learning Baseline Code

Hi @qxcv, first of all thanks a lot of open-sourcing the codebase for your amazing work. The codebase is indeed huge, I was wondering if this repository contains code for end-to-end Imitation Learning without any Representation Learning (i.e. w/o Pre-training/Join Training).

Also, do you have a small dataset so that I can check if it works on my end in small scale?

I also see that you have provided two datasets, can you please explain which involves which tasks?

Thanks!

opened by nileshop22 8
Fully Integrate Minecraft Dict Environments
This PR works to make Minecraft environments more generally compatible with the ILR training framework, including those environments that have Dict Action/Environment spaces

NOTE: This is a draft PR until the branches torch_conversion and then ilr_wrappers get merged into realistic_benchmarks, since the functionality here depends on implements new to those branches.

Modifications in this branch:

Modify Minecraft data-loading and env-loading code so that you can pass in an arbitrary set of wrappers in an env_cfg config entry, and have those wrap both the observations/actions coming from the loaded dataset, and also the actual live environment (Previously, there was single hardcoded wrapper applied to all Minecraft envs)

Add config options for using a SpaceFlattenedActorCriticPolicy (implemented in realistic_benchmarks) for environments that require it (i.e. environments with Dict-like Action spaces)

Modify our test configs list such that we only add Minecraft to the list of test configs when it is available as a benchmark (required because the actual config entry itself now requires an import from realistic_benchmarks, which relies on minerl being installed)
opened by decodyng 0
Additional tooling
This cleans up some of the existing tooling and adds some more. Specifically:

Removes requirements.txt and adds all deps to setup.py instead.

Adds run_rep_learner, il_train and il_test commands to entry_points so that we don't have to do python src/il_representations/scripts/{name}.py.

Automatically runs flake8 and isort as part of the unit tests. This uses the pytest-isort and pytest-flake8 plugins.

Adds a reformat.sh script that automatically cleans up imports with isort and autoflake, then reformats to be PEP8-compliant with yapf. This can handle most of the problems identified by pytest-isort and pytest-flake8. My hope is that from now on, we can just run reformat.sh before committing but otherwise not have to worry about formatting at all (except wrapping strings and comments).

This PR also reformats ALL the code to be compliant with the new flake8 style. I've rebased everything into two commits: one that makes the tooling changes, and one that makes the (mostly automated) formatting changes. This should make it easier to review.

One potentially controversial choice I've made here is the line length of 100. I noticed that Cody and Cynthia's code was mostly around ~120 columns. I personally prefer shorter line lengths because I can fit more code on my screen; e.g. <=83 columns means I can fit five files, <=105 columns means I can fit four, and <~130 means I can fit three. I chose 100 as a compromise that makes it easy to fit several files on the screen at the same time without forcing us to wrap too much. I'm open to discussions about that choice, as well as discussions about how we can tweak yapf to make files look nice :smile:

Another thing: this PR requires everyone to uninstall & reinstall MAGICAL (if they have it installed already). I made some changes that are necessary to work with the latest version of Pyglet, but apparently those won't automatically get installed by pip if you have an existing version :disappointed:
opened by qxcv 2
CIFAR-10 evaluation

Adds run_cifar.py, which runs representation learning with a ResNet-18 on CIFAR-10, finetunes a linear layer on top of it, and evaluates the accuracy of the resulting classifier. Hyperparameters are set to mimic SimCLR: https://github.com/google-research/simclr/

The current implementation still depends on an incorrect loss function (to be fixed in #10 ) and augments the examples at a batch level instead of a dataset level (to be fixed in a different PR).

opened by rohinmshah 1

Owner

Center for Human-Compatible AI

CHAI seeks to develop the conceptual and technical wherewithal to reorient the general thrust of AI research towards provably beneficial systems.

GitHub

An investigation project for SISR.

SISR-Survey An investigation project for SISR. This repository is an official project of the paper "From Beginner to Master: A Survey for Deep Learnin

79 Oct 20, 2022

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

56 Nov 15, 2022

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

pytorch-a2c-ppo-acktr Update (April 12th, 2021) PPO is great, but Soft Actor Critic can be better for many continuous control tasks. Please check out

3k Jan 9, 2023

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

3k Dec 31, 2022

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

8 Sep 14, 2022

Tilted Empirical Risk Minimization (ICLR '21)

Tilted Empirical Risk Minimization This repository contains the implementation for the paper Tilted Empirical Risk Minimization ICLR 2021 Empirical ri

40 Nov 28, 2022

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

121 Dec 17, 2022

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation (ICCV 2021) Introduction This is an official pytorch implemen

42 Jan 4, 2023

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

28 Oct 18, 2022

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

9 Jan 12, 2022

Disagreement-Regularized Imitation Learning

Due to a normalization bug the expert trajectories have lower performance than the rl_baseline_zoo reported experts. Please see the following link in

25 Apr 28, 2022

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

82 Jan 1, 2023

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022

Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

41 Jan 6, 2023

The Empirical Investigation of Representation Learning for Imitation (EIRLI)

Related tags

Overview

The Empirical Investigation of Representation Learning for Imitation (EIRLI)

Common Use Cases

Modular Algorithm Design

Training Scripts

Comments

Owner

Center for Human-Compatible AI

An investigation project for SISR.

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Tilted Empirical Risk Minimization (ICLR '21)

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Disagreement-Regularized Imitation Learning

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

ilpyt: imitation learning library with modular, baseline implementations in Pytorch

Visual Adversarial Imitation Learning using Variational Models (VMAIL)

Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

Eff video representation - Efficient video representation through neural fields