CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Aravind Srinivas

Last update: Dec 12, 2022

Related tags

Deep Learning curl_rainbow

Overview

CURL Rainbow

Status: Archive (code is provided as-is, no updates expected)

This is an implementation of CURL: Contrastive Unsupervised Representations for Reinforcement Learning coupled with the Data Efficient Rainbow method for Atari games. The code by default uses the 100k timesteps benchmark and has not been tested for any other setting.

Run the following command (or bash run_curl.sh) with the game as an argument:

python3 main.py --game ms_pacman

To install all dependencies, run bash install.sh.

Comments

illegal memory access

When I run the code on TITAN RTX with python3.6, CUDA 10.0.130, pytorch 1.4.0, I got the error like this: File "curl_rainbow/agent.py", line 74, in act return (a * self.support).sum(2).argmax(1).item() RuntimeError: CUDA error: an illegal memory access was encountered

But I can run this code on cpu. Does anyone know what happens here?

opened by Rivendile 3
T-max problem

Why did you choose T-max=100k in your experiment? I think you should train until its convergence. I run your code with T-max=800k on game pong，The CURL and rainbow have the same sample efficiency. Even when T-max=100k, I failed to reproduce the experimental results in the paper.

opened by zhanghongjie101 3
About the evaluation choice

Hi Aravind,

I like your work very much! But I just have one question about the evaluation, when I directly run your code in the game battle zone, sometimes the best performance is not in the end but in the middle, and the performance in the end could be worse and couldn't reach the scores mentioned in the paper. Did you use the best score ever, or only the score in the end in the evaluation? Thank you very much!

opened by Cohencohenchen 1
can't reproduce on pong and hero

I tried the code with default parameter you set.

But I can't reproduce the reward on pong and hero as reported in the paper. To be precise, the reward is ~-20 vs -16.5 on pong and ~3000 vs ~6000 on hero.

Do you have any idea why this happened? I simply used the command:

python main.py --game pong/hero

opened by noahcao 1
About Frame skip

In this CURL original paper, it says atari benchmark use 4 frameskip.

But, in this code, default code has 0 frameskip.

Is there something point that I miss?

opened by binmom 1
Where does the Query interact with the Rainbow DQN

Based on my understanding of the paper, I believe that the query-keys go into the contrastive learning objective function, while the queries go into the RL algorithm as observations. However, I am unable to find (in the code) where you send the query into the Rainbow DQN as state/observation. Can you please help me with this?

opened by Deepakgthomas 0

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Related tags

Overview

CURL Rainbow

Comments

illegal memory access

T-max problem

About the evaluation choice

can't reproduce on pong and hero

About Frame skip

Where does the Query interact with the Rainbow DQN

Owner

Aravind Srinivas

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Proto-RL: Reinforcement Learning with Prototypical Representations

Pretraining Representations For Data-Efficient Reinforcement Learning

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

The Unsupervised Reinforcement Learning Benchmark (URLB)

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Viewmaker Networks: Learning Views for Unsupervised Representation Learning