CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Overview

CALVIN

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks Oier Mees, Lukas Hermann, Erick Rosete, Wolfram Burgard

We present CALVIN (Composing Actions from Language and Vision), an open-source simulated benchmark to learn long-horizon language-conditioned tasks. Our aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor suites.

💻 Quick Start

To begin, clone this repository locally

git clone --recurse-submodules https://github.com/mees/calvin.git
$ export CALVIN_ROOT=$(pwd)/calvin

Install requirements:

$ cd $CALVIN_ROOT
$ virtualenv -p $(which python3) --system-site-packages calvin_env # or use conda
$ source calvin_env/bin/activate
$ sh install.sh

Download dataset (choose which split you want to download with the argument D, ABC or ABCD):

$ cd $CALVIN_ROOT/dataset
$ sh download_data.sh D | ABC | ABCD

🏋️‍♂️ Train Baseline Agent

Train baseline models:

$ cd $CALVIN_ROOT/calvin_models/calvin_agent
$ python training.py

You want to scale your training to a multi-gpu setup? Just specify the number of GPUs and DDP will automatically be used for training thanks to Pytorch Lightning. To train on all available GPUs:

$ python training.py trainer.gpus=-1

If you have access to a Slurm cluster, we also provide trainings scripts here.

You can use Hydra's flexible overriding system for changing hyperparameters. For example, to train a model with rgb images from both static camera and the gripper camera:

$ python training.py datamodule/observation_space=lang_rgb_static_gripper model/perceptual_encoder=gripper_cam

To train a model with RGB-D from both cameras:

$ python training.py datamodule/observation_space=lang_rgbd_both model/perceptual_encoder=RGBD_both

To train a model with rgb images from the static camera and visual tactile observations:

$ python training.py datamodule/observation_space=lang_rgb_static_tactile model/perceptual_encoder=static_RGB_tactile

To see all available hyperparameters:

$ python training.py --help

To resume a training, just override the hydra working directory :

$ python training.py hydra.run.dir=runs/my_dir

🖼️ Sensory Observations

CALVIN supports a range of sensors commonly utilized for visuomotor control:

  1. Static camera RGB images - with shape 200x200x3.
  2. Static camera Depth maps - with shape 200x200x1.
  3. Gripper camera RGB images - with shape 200x200x3.
  4. Gripper camera Depth maps - with shape 200x200x1.
  5. Tactile image - with shape 120x160x2x3.
  6. Proprioceptive state - EE position (3), EE orientation in euler angles (3), gripper width (1), joint positions (7), gripper action (1).

🕹️ Action Space

In CALVIN, the agent must perform closed-loop continuous control to follow unconstrained language instructions characterizing complex robot manipulation tasks, sending continuous actions to the robot at 30hz. In order to give researchers and practitioners the freedom to experiment with different action spaces, CALVIN supports the following actions spaces:

  1. Absolute cartesian pose - EE position (3), EE orientation in euler angles (3), gripper action (1).
  2. Relative cartesian displacement - EE position (3), EE orientation in euler angles (3), gripper action (1).
  3. Joint action - Joint positions (7), gripper action (1).

💪 Evaluation: The Calvin Challenge

Long-horizon Multi-task Language Control (LH-MTLC)

The aim of the CALVIN benchmark is to evaluate the learning of long-horizon language-conditioned continuous control policies. In this setting, a single agent must solve complex manipulation tasks by understanding a series of unconstrained language expressions in a row, e.g., “open the drawer. . . pick up the blue block. . . now push the block into the drawer. . . now open the sliding door”. We provide an evaluation protocol with evaluation modes of varying difficulty by choosing different combinations of sensor suites and amounts of training environments. To avoid a biased initial position, the robot is reset to a neutral position before every multi-step sequence.

To evaluate a trained calvin baseline agent, run the following command:

$ cd $CALVIN_ROOT/calvin_models/calvin_agent
$ python evaluation/evaluate_policy.py --dataset_path <PATH/TO/DATASET> --train_folder <PATH/TO/TRAINING/FOLDER>

Optional arguments:

  • --checkpoint <PATH/TO/CHECKPOINT>: by default, the evaluation loads the last checkpoint in the training log directory. You can instead specify the path to another checkpoint by adding this to the evaluation command.
  • --debug: print debug information and visualize environment.

If you want to evaluate your own model architecture on the CALVIN challenge, you can implement the CustomModel class in evaluate_policy.py as an interface to your agent. You need to implement the following methods:

  • __init__(): gets called once at the beginning of the evaluation.
  • reset(): gets called at the beginning of each evaluation sequence.
  • step(obs, goal): gets called every step and returns the predicted action.

Then evaluate the model by running:

$ python evaluation/evaluate_policy.py --dataset_path <PATH/TO/DATASET> --custom_model

You are also free to use your own language model instead of using the precomputed language embeddings provided by CALVIN. For this, implement CustomLangEmbeddings in evaluate_policy.py and add --custom_lang_embeddings to the evaluation command.

Multi-task Language Control (MTLC)

Alternatively, you can evaluate the policy on single tasks and without resetting the robot to a neutral position. Note that this evaluation is currently only available for our baseline agent.

$ python evaluation/evaluate_policy_singlestep.py --dataset_path <PATH/TO/DATASET> --train_folder <PATH/TO/TRAINING/FOLDER> [--checkpoint <PATH/TO/CHECKPOINT>] [--debug]

Pre-trained Model

Download the MCIL model checkpoint trained on the static camera rgb images on environment D.

$ wget http://calvin.cs.uni-freiburg.de/model_weights/D_D_static_rgb_baseline.zip
$ unzip D_D_static_rgb_baseline.zip

💬 Relabeling Raw Language Annotations

You want to try learning language conditioned policies in CALVIN with a new awesome language model?

We provide an example script to relabel the annotations with different language model provided in SBert, such as the larger MPNet (paraphrase-mpnet-base-v2) or its corresponding multilingual model (paraphrase-multilingual-mpnet-base-v2). The supported options are "mini", "mpnet" and "multi". If you want to try different SBert models, just change the model name here.

cd $CALVIN_ROOT/calvin_models/calvin_agent
python utils/relabel_with_new_lang_model.py +path=$CALVIN_ROOT/dataset/task_D_D/ +name_folder=new_lang_model_folder model.nlp_model=mpnet

If you additionally want to sample different language annotations for each sequence (from the same task annotations) in the training split run the same command with the parameter reannotate=true.

📈 SOTA Models

Open-source models that outperform the MCIL baselines from CALVIN:

Contact Oier to add your model here.

Reinforcement Learning with CALVIN

Are you interested in trying reinforcement learning agents for the different manipulation tasks in the CALVIN environment? We provide a google colab to showcase how to leverage the CALVIN task indicators to learn RL agents with a sparse reward.

Citation

If you find the dataset or code useful, please cite:

@article{calvin21,
author = {Oier Mees and Lukas Hermann and Erick Rosete-Beas and Wolfram Burgard},
title = {CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks},
journal={arXiv preprint arXiv:2112.03227},
year = 2021,
}

License

MIT License

Comments
  • I am trying to follow the instruction to run the code, however, after resolving some errors, there are some errors I cannot fix

    I am trying to follow the instruction to run the code, however, after resolving some errors, there are some errors I cannot fix

    The command I exexuted :
    $ cd $CALVIN_ROOT/calvin_models/calvin_agent $ python training.py

    errors I got after fixing some errors. File "/home/nikepupu/anaconda3/envs/calvin/lib/python3.7/site-packages/hydra/_internal/utils.py", line 573, in _locate raise ImportError(f"Error loading module '{path}'") from e ImportError: Error loading module 'lfp.utils.transforms.NormalizeVector'

    I investigate the github, and I found out that this file is from here: which is missing image

    python /home/hermannl/repos/learning_from_play/lfp/training.py

    In addition, I need to modify utils.transforms.py to get to this point.

    I modify ScaleImageTensor to this: class NormalizeVector(object): """Normalize a tensor vector with mean and standard deviation."""

    def __init__(self, mean=0.0, std=1.0):
        **if isinstance(mean, float):
            mean = [mean]
        if isinstance(std, float):
            std = [std]**
        print("success")
        self.std = torch.Tensor(std)
        self.std[self.std == 0.0] = 1.0
        self.mean = torch.Tensor(mean)
    
    def __call__(self, tensor: torch.Tensor) -> torch.Tensor:
        assert isinstance(tensor, torch.Tensor)
        return (tensor - self.mean) / self.std
    
    def __repr__(self):
        return self.__class__.__name__ + "(mean={0}, std={1})".format(self.mean, self.std)
    

    I also modify add depthNoise to this: class AddDepthNoise(object): """Add multiplicative gamma noise to depth image. This is adapted from the DexNet 2.0 code. Their code: https://github.com/BerkeleyAutomation/gqcnn/blob/master/gqcnn/training/tf/trainer_tf.py"""

    def __init__(self, shape=1000.0, rate=1000.0):
        self.shape = torch.tensor(shape)
        self.rate = torch.tensor(rate)
        self.dist = torch.distributions.gamma.Gamma(torch.tensor(shape), torch.tensor(rate))
    
    def __call__(self, tensor: torch.Tensor) -> torch.Tensor:
        assert isinstance(tensor, torch.Tensor)
        multiplicative_noise = self.dist.sample()
        return multiplicative_noise * tensor
    
    def __repr__(self):
        # return self.__class__.__name__ + f"{self.shape=},{self.rate=},{self.dist=}"
        **return self.__class__.__name__ + "(self.shape={0}, self.rate={1}, self.dist={2})".format(self.shape, self.rate, self.dist)**
    
    opened by nikepupu 15
  • Stuck at beginning of training

    Stuck at beginning of training

    Hi!

    Running the baseline training gets stuck at the very beginning. Do you have any clue why that might be? Is it normal for iterations to take 23.91s/it? There is no error.

    The only difference I have with your requirements is the PyTorch version as only the nightly release seems to work with CUDA and Pytorch lightning 1.4.9 on our machine.

    root_data_dir: /media/dennisushi/DREVO-P1/DATA/calvin/task_D_D/task_A_A
    ...
    slurm: false
    ...
    [2022-03-17 11:16:03,707][__main__][INFO] - * CUDA:
    	- GPU:
    		- GeForce RTX 3080
    		- GeForce RTX 3080
    	- available:         True
    	- version:           11.1
    * Packages:
    	- numpy:             1.21.2
    	- pyTorch_debug:     False
    	- pyTorch_version:   1.12.0.dev20220224+cu111
    	- pytorch-lightning: 1.4.9
    	- tqdm:              4.63.0
    ...
    ...
    [2022-03-17 11:16:22,988][calvin_agent.models.play_lmp][INFO] - Finished validation epoch 0
    Global seed set to 42                                                                                                
    Epoch 0:   0%|                                                                   | 0/19063 [00:00<00:04, 3979.42it/s][2022-03-17 11:16:23,004][calvin_agent.models.play_lmp][INFO] - Start training epoch 0
    Epoch 0:   0%|                                          | 3/19063 [01:35<126:35:39, 23.91s/it, loss=42.1, v_num=6-02]
    
    opened by dennisushi 10
  • Feature request: single task datasets

    Feature request: single task datasets

    Hi there! I think it would be really nice if there was a script and dataset for a selection of individual tasks in CALVIN, so that one could test their method on just a single task. I've started working on this already, does it sound like a useful feature?

    opened by ezhang7423 8
  • Reseting env to state from dataset

    Reseting env to state from dataset

    Hi,

    I'm trying to generating skill id / language annotations for the unlabeled frames in the dataset. I was thinking of using the reset_from_storage method in the environment class to reset to a state from the dataset and using the task checker to check for task success. However, the reset function requires a serialized version of the env/robot state which is not provided. Is there a way I could reset the env from offline data or is there another way for me to get skill annotations for the entire dataset?

    Thanks!

    opened by aliang8 7
  • Major concern about evaluation

    Major concern about evaluation

    Hi there! I've found that rolling out ground truth trajectories (labelled by the language annotator) from the dataset is not always evaluated to be successful by the Tasks.get_task_info. This seems to be quite concerning. Perhaps I've done something wrong on my end? image

    To reproduce, I have forked the repo with minimal changes here: https://github.com/mees/calvin/pull/33 The only difference is in line 47 in calvin_modesl/calvin_agent/evaluation/evaluate_policy_singlestep.py, where instead of rolling out the model I roll out the dataset actions.

    The exact commands I ran from beginning to end:

    # set up environment
    git clone [email protected]:ezhang7423/calvin.git --recursive
    cd calvin
    conda create --name calvin python=3.8
    conda activate calvin
    pip install setuptools==57.5.0 torchmetrics==0.6.0
    ./install.sh
    
    # get pretrained weights and fix the config.yaml
    cp ./D_D_static_rgb_baseline/.hydra/config.yaml ./tmp.yaml
    wget http://calvin.cs.uni-freiburg.de/model_weights/D_D_static_rgb_baseline.zip
    unzip D_D_static_rgb_baseline.zip
    unzip D_D_static_rgb_baseline.zip
    mv ./tmp.yaml ./D_D_static_rgb_baseline/.hydra/config.yaml
    
    # get data
    cd dataset
    ./download_data.sh D
    cd ../
    
    # run the evaluation
    python calvin_models/calvin_agent/evaluation/evaluate_policy_singlestep.py --dataset_path $DATA_GRAND_CENTRAL/task_D_D/ --train_folder ./D_D_static_rgb_baseline/ --checkpoint D_D_static_rgb_baseline/mcil_baseline.ckpt
    
    opened by ezhang7423 5
  • The proportion of the recorded robot interaction data with language instructions

    The proportion of the recorded robot interaction data with language instructions

    Hi,

    Thanks for your excellent benchmark!

    I have a question regarding the proportion of the recorded robot interaction data with language instructions.

    The CALVIN paper says thet "we annotate only 1% of the recorded robot interaction data with language instructions."

    After I download the dataset "task_D_D", "ep_start_end_ids.npy" under the training folder records 512046 unique episodes (saved as .npz file). Under "training/lang_annotations" folder, "auto_lang_ann.npy" records 303794 episodes with 192607 unique episodes. In this way, it seems that 192607 episodes are annotated with language instructions among all the 512046 episodes in the training set. The proportion is 192607/512046 , different from 1% in the paper.

    If my analysis is correct, why is the the proportion of the recorded robot interaction data with language instructions in the dataset different from that in paper?

    Looking forward to your reply.

    opened by geyuying 5
  • "InvalidGitRepositoryError" while running jupyter notebook

    Hi, I'd like to run RL_with_CALVIN.ipynb file in my locally established environment, but I meet the following issue while running the line "env = hydra.utils.instantiate(cfg.env)": ‘InvalidGitRepositoryError: Error instantiating 'calvin_env.envs.play_table_env.PlayTableSimEnv' Exactly I don't know how to solve it, so please give more help.

    PS. it seems that there exists some script issues while opening RL_with_CALVIN.ipynb.

    opened by 2000222 5
  • Language annotations from the automatic annotation tool

    Language annotations from the automatic annotation tool

    Hi,

    I visualize the episode with language instructions by sampling 4 images ("rgb_static") within the episode in ABC training set (Task_ABC_D) and some language instructions ("task" and "ann") seem to be wrong as shown below,

    35_744905_744969_grasp the blue block and rotate it right

    The language instruction of the above episode is "grasp the blue block and rotate it right". You can check the example, whose ['info']['indx'] is (744905, 744969). Such cases are not rare in ABC training set, but not in D training and validation set.

    I further use the automatic annotation tool to re-annoate the episodes as showed in https://github.com/mees/calvin/issues/24 and get the same task information ("rotate_blue_block_right" in the above example) as in the downloaded "auto_lang_ann.npy".

    Is there anyway to make the language annotations more accurate?

    Looking forward to your reply.

    Best regards, Yuying

    opened by geyuying 4
  • `scripts/automatic_lang_annotator_mp.py` appears to be broken

    `scripts/automatic_lang_annotator_mp.py` appears to be broken

    The configuration file that this script uses, lang_ann.yml, does not appear to have two required keys: rollout_sentences and annotations. To replicate this, one can simply try running the script.

    opened by ezhang7423 4
  • Simple script to visualize and run through data from a CALVIN dataset.

    Simple script to visualize and run through data from a CALVIN dataset.

    In reference to #20 I submit this very rudimentary visualization script. It simply finds all the episode files from a given folder and then allows the user to scrub through the data with the arrow keys. Could benefit from a tqdm indicator for where one is in a dataset and from keys to skip larger parts.

    opened by ARoefer 4
  • Errors with EGL

    Errors with EGL

    Thanks for this work. When I followed the readme and ran python training.py datamodule.root_data_dir=/path/to/dataset/, it reported an error as Segmentation fault (core dumped) when loading EGL plugin in calvin_env.

    I use a Ubuntu 16.04 sever with a Nvidia 2080Ti card, and the driver is nvidia-container-runtime 3.5.0-1 and cuda is 11.2. I have searched the Internet for a while, such as installing mesa assudo apt-get install libglfw3-dev libgles2-mesa-dev, but still did not work.

    I would like to inquire if you know how to enable EGL with my hardware setting and what is the function of EGL, for displaying the robot?

    By the way, what is the time of training the baseline on the three datasets provided, i.e., task_D.zip, task_ABC_D.zip, task_ABCD_D.zip?

    Thanks very much.

    opened by zhaozj89 4
Owner
Oier Mees
PhD Student at the University of Freiburg, Germany. Researcher in Machine Learning and Robotics.
Oier Mees
Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

V-MPO Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pyt

Nugroho Dewantoro 9 Jun 6, 2022
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 49 Nov 28, 2022
Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

Guiding Evolutionary Strategies by Differentiable Robot Simulators In recent years, Evolutionary Strategies were actively explored in robotic tasks fo

Vladislav Kurenkov 4 Dec 14, 2021
Attention-driven Robot Manipulation (ARM) which includes Q-attention

Attention-driven Robotic Manipulation (ARM) This codebase is home to: Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation I

Stephen James 84 Dec 29, 2022
Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation [Project website] [Paper] This project is a PyTorch i

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 6 Feb 28, 2022
MohammadReza Sharifi 27 Dec 13, 2022
Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Mingrui Yu 3 Jan 7, 2022
Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Learning Domain Invariant Representations in Goal-conditioned Block MDPs Beining Han, Chongyi Zheng, Harris Chan, Keiran Paster, Michael R. Zhang, Jim

Chongyi Zheng 3 Apr 12, 2022
SAPIEN Manipulation Skill Benchmark

ManiSkill Benchmark SAPIEN Manipulation Skill Benchmark (abbreviated as ManiSkill, pronounced as "Many Skill") is a large-scale learning-from-demonstr

Hao Su's Lab, UCSD 107 Jan 8, 2023
Emotional conditioned music generation using transformer-based model.

This is the official repository of EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation. The paper has b

hung anna 96 Nov 9, 2022
PyTorch implementation of "Learn to Dance with AIST++: Music Conditioned 3D Dance Generation."

Learn to Dance with AIST++: Music Conditioned 3D Dance Generation. Installation pip install -r requirements.txt Prepare Dataset bash data/scripts/pre

Zj Li 8 Sep 7, 2021
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
Official repository for the paper "Instance-Conditioned GAN"

Official repository for the paper "Instance-Conditioned GAN" by Arantxa Casanova, Marlene Careil, Jakob Verbeek, Michał Drożdżal, Adriana Romero-Soriano.

Facebook Research 510 Dec 30, 2022
DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing Figure: Joint multi-attribute edits using DyStyle model. Great diversity

null 74 Dec 3, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

ALFRED A Benchmark for Interpreting Grounded Instructions for Everyday Tasks Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han,

ALFRED 204 Dec 15, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022