Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Related tags

Deep Learning robust_onpolicy_data_collection

Overview

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation

Source code of Robust On-Policy Data Collection for Data-Efficient Policy Evaluation (NeurIPS 2021 Workshop on OfflineRL).

The code is written in python 3, using Pytorch for the implementation of the deep networks and OpenAI gym for the experiment domains.

Requirements

To install the required codebase, it is recommended to create a conda or a virtual environment. Then, run the following command

pip install -r requirements.txt

Preparation

To conduct policy evaluation, we need to prepare a set of pretrained policies. You can skip this part if you already have the pretrained models in policy_models/ and the corresponding policy values in experiments/policy_info.py

Pretrained Policy

Train the policy models using REINFORCE in different domains by running:

python policy/reinfoce.py --exp_name {exp_name}

where {exp_name} can be MultiBandit, GridWorld, CartPole or CartPoleContinuous. The parameterized epsilon-greedy policies for MultiBandit and GridWorld can be obtained by running:

python policy/handmade_policy.py

Policy Value

Option 1: Run in sequence

For each policy model, the true policy value is estimated with $10^6$ Monte Carlo roll-outs by running:

python experiments/policy_value.py --policy_name {policy_name} --seed {seed} --n 10e6

This will print the average steps, true policy value and variance of returns. Make sure you copy these results into the file experiment/policy_info.py.

Option 2: Run in parallel

If you can use qsub or sbatch, you can also run jobs/jobs_value.py with different seeds in parallel and merge them by running experiments/merge_values.py to get $10^6$ Monte Carlo roll-outs. The policy values reported in this paper were obtained in this way.

Evaluation

Option 1: Run in sequence

The main running script for policy evaluation is experiments/evaluate.py. The following running command is an example of Monte Carlo estimation for Robust On-policy Acting with $\rho=1.0$ for the policy model_GridWorld_5000.pt with seeds from 0 to 199.

python experiments/evaluate.py --policy_name GridWorld_5000 --ros_epsilon 1.0 --collectors RobustOnPolicyActing --estimators MonteCarlo --eval_steps "7,14,29,59,118,237,475,951,1902,3805,7610,15221,30443,60886" --seeds "0,199"

To conduct policy evaluation with off-policy data, you need to add the following arguments to the above running command:

--combined_trajectories 100 --combined_ops_epsilon 0.10

Option 2: Run in parallel

If you can use qsub or sbatch, you may only need to run the script jobs/jobs.py where all experiments in the paper are arranged. The log will be saved in log/ and the seed results will be saved in results/seeds. Note that we save the data collection cache in results/data and re-use it for different value estimations. To merge results of different seeds, run experiments/merge_results.py, and the merged results will be saved in results/.

Ploting

When the experiments are finished, all the figures in the paper are produced by running

python drawing/draw.py

Citing

If you use this repository in your work, please consider citing the paper

@inproceedings{zhong2021robust,
    title = {Robust On-Policy Data Collection for Data-Efficient Policy Evaluation},
    author = {Rujie Zhong, Josiah P. Hanna, Lukas Schäfer and Stefano V. Albrecht},
    booktitle = {NeurIPS Workshop on Offline Reinforcement Learning (OfflineRL)},
    year = {2021}
}

You might also like...

Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling This repo contains the official implementation for the paper On Path Int

39 Nov 10, 2022

Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021) Official Pytorch implementation of Unbiased Classification

17 Jan 1, 2023

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

15 Nov 21, 2022

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise NeurIPS 2021: This repository is the official implementation of ODNL. Require

12 Dec 7, 2022

Official implementation of Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021).

Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas

9 Nov 3, 2021

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

11 Nov 27, 2022

Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)

NBFNet: Neural Bellman-Ford Networks This is the official codebase of the paper Neural Bellman-Ford Networks: A General Graph Neural Network Framework

136 Dec 21, 2022

Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021

Introduction Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021 Prerequisites Python 3.8 and conda, get Conda CUDA 11

51 Dec 3, 2022

Official implementation of NeurIPS'2021 paper TransformerFusion

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers Project Page | Paper | Video TransformerFusion: Monocular RGB Scene Reconstru

118 Dec 25, 2022

Comments

Replicate code for Pretrained Policy

Hey there I am trying to replicate the code and I've got a question! This is my first time opening an issue - if I forget something vital to include please tell me.

Installing the requirements with pip install -r requirements.txt and executing python policy/reinforce.py --exp_name MultiBandit and also python policy/reinforce.py --exp_name GridWorld has worked.

The problem I have encountered is in trying to execute 'python policy/reinforce.py --exp_name CartPole'.

I get this error log:

❯ python policy/reinforce.py --exp_name CartPole
/Users/maya199x/miniforge3/envs/testCode/lib/python3.10/site-packages/gym/envs/registration.py:555: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1`.
  logger.warn(
training: CartPole, config: (<TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v0>>>>>, 200, 0.99, 0.001, [1000])
Traceback (most recent call last):
  File "/Users/maya199x/Documents/PythonFiles/RobustOnPolicyData/policy/reinforce.py", line 301, in <module>
    train()
  File "/Users/maya199x/Documents/PythonFiles/RobustOnPolicyData/policy/reinforce.py", line 181, in train
    env.seed(0)
  File "/Users/maya199x/miniforge3/envs/testCode/lib/python3.10/site-packages/gym/core.py", line 242, in __getattr__
    return getattr(self.env, name)
  File "/Users/maya199x/miniforge3/envs/testCode/lib/python3.10/site-packages/gym/core.py", line 242, in __getattr__
    return getattr(self.env, name)
  File "/Users/maya199x/miniforge3/envs/testCode/lib/python3.10/site-packages/gym/core.py", line 242, in __getattr__
    return getattr(self.env, name)
AttributeError: 'CartPoleEnv' object has no attribute 'seed'

Is there some issue with my installation? Or is it something else entirely? Any help would be much appreciated!

I am running the code on a macbook m1 2020, in a conda virtual environment with python version 3.10.6.

opened by Maya199x 0

Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Related tags

Overview

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation

Requirements

Preparation

Pretrained Policy

Policy Value

Option 1: Run in sequence

Option 2: Run in parallel

Evaluation

Option 1: Run in sequence

Option 2: Run in parallel

Ploting

Citing

You might also like...

Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)

Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Official implementation of Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021).

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)

Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021

Official implementation of NeurIPS'2021 paper TransformerFusion

Comments

Replicate code for Pretrained Policy

Owner

Autonomous Agents Research Group (University of Edinburgh)

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.

[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

Official implementation of NeurIPS 2021 paper "Contextual Similarity Aggregation with Self-attention for Visual Re-ranking"