The Unsupervised Reinforcement Learning Benchmark (URLB)
URLB provides a set of leading algorithms for unsupervised reinforcement learning where agents first pre-train without access to extrinsic rewards and then are finetuned to downstream tasks.
Requirements
We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running
conda env create -f conda_env.yml
After the instalation ends you can activate your environment with
conda activate urlb
Implemented Agents
Agent | Command | Implementation Author(s) | Paper |
---|---|---|---|
ICM | agent=icm |
Denis | paper |
ProtoRL | agent=proto |
Denis | paper |
DIAYN | agent=diayn |
Misha | paper |
APT(ICM) | agent=icm_apt |
Hao, Kimin | paper |
APT(Ind) | agent=ind_apt |
Hao, Kimin | paper |
APS | agent=aps |
Hao, Kimin | paper |
SMM | agent=smm |
Albert | paper |
RND | agent=rnd |
Kevin | paper |
Disagreement | agent=disagreement |
Catherine | paper |
Available Domains
We support the following domains.
Domain | Tasks |
---|---|
walker |
stand , walk , run , flip |
quadruped |
walk , run , stand , jump |
jaco |
reach_top_left , reach_top_right , reach_bottom_left , reach_bottom_right |
Domain observation mode
Each domain supports two observation modes: states and pixels.
Model | Command |
---|---|
states | obs_type=states |
pixels | obs_type=pixels |
Instructions
Pre-training
To run pre-training use the pretrain.py
script
python pretrain.py agent=icm domain=walker
or, if you want to train a skill-based agent, like DIAYN, run:
python pretrain.py agent=diayn domain=walker
This script will produce several agent snapshots after training for 100k
, 500k
, 1M
, and 2M
frames. The snapshots will be stored under the following directory:
./pretrained_models/<obs_type>/<domain>/<agent>/
For example:
./pretrained_models/states/walker/icm/
Fine-tuning
Once you have pre-trained your method, you can use the saved snapshots to initialize the DDPG
agent and fine-tune it on a downstream task. For example, let's say you have pre-trained ICM
, you can fine-tune it on walker_run
by running the following command:
python finetune.py pretrained_agent=icm task=walker_run snapshot_ts=1000000 obs_type=states
This will load a snapshot stored in ./pretrained_models/states/walker/icm/snapshot_1000000.pt
, initialize DDPG
with it (both the actor and critic), and start training on walker_run
using the extrinsic reward of the task.
For methods that use skills, include the agent, and the reward_free
tag to false.
python finetune.py pretrained_agent=smm task=walker_run snapshot_ts=1000000 obs_type=states agent=smm reward_free=false
Monitoring
Logs are stored in the exp_local
folder. To launch tensorboard run:
tensorboard --logdir exp_local
The console output is also available in a form:
| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42
a training entry decodes as
F : total number of environment frames
S : total number of agent steps
E : total number of episodes
R : episode return
FPS: training throughput (frames per second)
T : total training time