Repo for our ICML21 paper Unsupervised Learning of Visual 3D Keypoints for Control

Boyuan Chen

Last update: Jul 22, 2022

Related tags

Deep Learning unsup-3d-keypoints

Overview

Unsupervised Learning of Visual 3D Keypoints for Control

[Project Website] [Paper]

Boyuan Chen¹, Pieter Abbeel¹, Deepak Pathak²
¹UC Berkeley ²Carnegie Mellon University

This is the code base for our paper on unsupervised learning of visual 3d keypoints for control. We propose an unsupervised learning method that learns temporally-consistent 3d keypoints via interaction. We jointly train an RL policy with the keypoint detector and shows 3d keypoints improve the sample efficiency of task learning in a variety of environments. If you find this work helpful to your research, please cite us as:

@inproceedings{chen2021unsupervised,
    title={Unsupervised Learning of Visual 3D Keypoints for Control},
    author={Boyuan Chen and Pieter Abbeel and Deepak Pathak},
    year={2021},
    Booktitle={ICML}
}

Environment Setup

If you hope to run meta-world experiments, make sure you have your mujoco binaries and valid license key in ~/.mujoco. Otherwise, you should edit the requirements.txt to remove metaworld and mujoco-py accordingly to avoid errors.

# clone this repo
git clone https://github.com/buoyancy99/unsup-3d-keypoints
cd unsup-3d-keypoints

# setup conda environment
conda create -n keypoint3d python=3.7.5
conda activate keypoint3d
pip3 install -r requirements.txt

Run Experiments

When training, all logs will be stored at data/, visualizations will be stored in images/ and all check points at ckpts/. You may use tensorboard to visualize training log or plotting the monitor files.

Quick start with pre-trained weights

# Visualize metaworld-hammer environment
python3 visualize.py --algo ppokeypoint -t hammer -v 1 -m 3d -j --offset_crop --decode_first_frame --num_keypoint 6 --decode_attention --seed 99 -u -e 0007

# Visualize metaworld-close-box environment
python3 visualize.py --algo ppokeypoint -t bc -v 1 -m 3d -j --offset_crop --decode_first_frame --num_keypoint 6 --decode_attention --seed 99 -u -e 0008

Reproduce the keypoints similiar to the two pre-trained checkpoints

# To reproduce keypoints visualization similiar to the above two checkpoints, use these commands
# Feel free to try any seed using [--seed]. Seeding makes training deterministic on each machine but has no guarantee across devices if using GPU. Thus you might not get the exact checkpoints as me if GPU models differ but resulted keypoints should look similiar. 

python3 train.py --algo ppokeypoint -t hammer -v 1 -e 0007 -m 3d -j --total_timesteps 6000000 --offset_crop --decode_first_frame --num_keypoint 6 --decode_attention --seed 200 -u

python3 train.py --algo ppokeypoint -t bc -v 1 -e 0008 -m 3d -j --total_timesteps 6000000 --offset_crop --decode_first_frame --num_keypoint 6 --decode_attention --seed 200 -u

Train & Visualize Pybullet Ant with Keypoint3D(Ours)

# use -t antnc to train ant with no color 
python3 train.py --algo ppokeypoint -t ant -v 1 -e 0001 -m 3d --frame_stack 2 -j --total_timesteps 5000000 --num_keypoint 16 --latent_stack --decode_first_frame --offset_crop --mean_depth 1.7 --decode_attention --separation_coef 0.005 --seed 99 -u

# After checkpoint is saved, visualize
python3 visualize.py --algo ppokeypoint -t ant -v 1 -e 0001 -m 3d --frame_stack 2 -j --total_timesteps 5000000 --num_keypoint 16 --latent_stack --decode_first_frame --offset_crop --mean_depth 1.7 --decode_attention --separation_coef 0.005 --seed 99 -u

Train Pybullet Ant with baselines

# RAD PPO baseline
python3 train.py --algo pporad -t ant -v 1 -e 0002 --total_timesteps 5000000 --frame_stack 2 --seed 99 -u

# Vanilla PPO baseline
python3 train.py --algo ppopixel -t ant -v 1 -e 0003 --total_timesteps 5000000 --frame_stack 2 --seed 99 -u

Train & Visualize 'Close-Box' environment in Meta-world with Keypoint3D(Ours)

python3 train.py --algo ppokeypoint -t bc -v 1 -e 0004 -m 3d -j --offset_crop --decode_first_frame --num_keypoint 32 --decode_attention --total_timesteps 4000000 --seed 99 -u

# After checkpoint is saved, visualize
python3 visualize.py --algo ppokeypoint -t bc -v 1 -e 0004 -m 3d -j --offset_crop --decode_first_frame --num_keypoint 32 --decode_attention --total_timesteps 4000000 --seed 99 -u

Train 'Close-Box' environment in Meta-world with baselines

# RAD PPO baseline
python3 train.py --algo pporad -t bc -v 1 -e 0005 --total_timesteps 4000000 --seed 99 -u

# Vanilla PPO baseline
python3 train.py --algo ppopixel -t bc -v 1 -e 0006 --total_timesteps 4000000 --seed 99 -u

Other environments in general

# Any training command follows the following format
python3 train.py -a [algo name] -t [env name] -v [env version] -e [experiment id] [...]

# Any visualization command is simply using the same options but run visualize.py instead of train.py
python3 visualize.py -a [algo name] -t [env name] -v [env version] -e [experiment id] [...]

# For colorless ant, you can change the ant example's [-t ant] flag to [-t antnc]
# For metaworld, you can change the close-box example's [-t bc] flag to other abbreviations such as [-t door] etc.

# For a full list of arugments and their meanings,
python3 train.py -h

Update Log

Data	Notes
Jun/15/21	Initial release of the code. Email me if you have questions or find any errors in this version.
Jun/16/21	Add all metaworld environments with notes about placeholder observations

You might also like...

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations This repo contains the Pytorch implementation of our paper: Revisit

80 Nov 20, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

49 Nov 10, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

🌈 ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

ROS-UGV-Control-Interface - Control interface which can be used in any UGV

ROS-UGV-Control-Interface Cam Closed: Cam Opened:

1 Nov 4, 2022

Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.

Hand Gesture Volume Control Modules There are basically three modules Handtracking Program Handtracking Module Volume Control Program Handtracking Pro

1 Jan 12, 2022

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Ka

87 Nov 30, 2022

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation [Project website] [Paper] This project is a PyTorch i

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC

6 Feb 28, 2022

Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

SharinGAN Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation" The official project we

23 Oct 19, 2022

Comments

Program gets killed while running & Request for CURL baseline

Hi, Thank you for your great work and the kind release of your implementation!

However, when I tried to run your code as described in README, the program got "Killed" after running for 10+ hours. Do you have any idea what is going on?

In addition, would you like to share with us the CURL baseline implementation in this framework?

Appreciate your help.

opened by wwwwwyyyyyxxxxx 3
2D keypoint network implementation {Minderer et al. 2019}

Hi, thanks for sharing this awesome work to our community!

In the paper, authors compare the proposed method with 2D keypoint representation, which is originally proposed by Minderer et al. 2019.

While investigating the code implementation, I got confused by the '2D' argument of the keypoint-based PPO algorithm. Is parsing with '2D' argument instantiates the representation learning proposed by Minderer et al. or just the 2D fraction of this work? From the code, it seems the 2D version of the keypoint network is not related to that of Minderer et al.

Thanks for your help in advance!

opened by mch5048 2
Questions and request about CURL implementation
Hi, Thank you for your great work and the kind release of your implementation!

Please note that this is not necessarily an issue, but a set of questions and requests.

While trying to carefully reproduce results in the paper, I have some questions regarding CURL experiment setup in the multi-view setting. Specifically, I have the following 4 questions (and requests) in your CURL implementation:

Did you replace SAC with PPO in CURL, the same as other baseline methods?

Are image augmentation techniques independently applied to query and key observations?

Did you use the momentum encoder to encode key observations?

Can you provide your own CURL implementation? (it would be best to resolve all of my questions!)

Thank you!
opened by gr8joo 2
Dependency issues are raised when I try to use Mujoco-210
Hi Boyuan, Thank you for your insightful work along with providing your implementation in public!

I found before that your code works like a charm when I used my paid Mujoco-200. Although it is expired now, Mujoco has recently released their license for free whose version is 2.1.0 or higher.

I tried to use Mujoco-2.1.x by modifying the line 23 in requirements.txt (mujoco-py==2.0.2.13) to mujoco-py==2.1.2.14, and what I observed is 'mujoco-py==2.1.2.14' conflicts to 'git+git://github.com/buoyancy99/metaworld@unsup-3d-keypoints', the line 48 in requirements.txt as below:

ERROR: Cannot install -r requirements.txt (line 48) and mujoco-py==2.1.2.14 because these package versions have conflicting dependencies. The conflict is caused by: The user requested mujoco-py==2.1.2.14 metaworld 0.0.0 depends on mujoco-py<2.1 and >=2.0 To fix this you could try to: 1. loosen the range of package versions you've specified 2. remove package versions to allow pip attempt to solve the dependency conflict ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

Considering that we should shift to using free-licensed mujoco in the end (since all the paid licenses will be expired), would you modify requirements.txt (and your related git pages such as the last three lines in the requirement.txt) to support the latest free mujoco (Mujoco-2.1.x)?

Thank you!
opened by gr8joo 2

Owner

Boyuan Chen

PhD at MIT studying ML + Robotics

GitHub

A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

Minimal Body A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image. The model file is only 51.2 MB and runs a

49 Dec 5, 2022

Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

271 Nov 29, 2022

Models, datasets and tools for Facial keypoints detection

Template for Data Science Project This repo aims to give a robust starting point to any Data Science related project. It contains readymade tools setu

1 Feb 11, 2022

This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

JigsawClustering Jigsaw Clustering for Unsupervised Visual Representation Learning Pengguang Chen, Shu Liu, Jiaya Jia Introduction This project provid

73 Sep 18, 2022

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds This repository is a PyTorch implementation for paper: Uns

42 Dec 9, 2022

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

114 Oct 16, 2022

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

34 Dec 21, 2022

Repo for our ICML21 paper Unsupervised Learning of Visual 3D Keypoints for Control

Related tags

Overview

Unsupervised Learning of Visual 3D Keypoints for Control

[Project Website] [Paper]

Environment Setup

Run Experiments

Quick start with pre-trained weights

Reproduce the keypoints similiar to the two pre-trained checkpoints

Train & Visualize Pybullet Ant with Keypoint3D(Ours)

Train Pybullet Ant with baselines

Train & Visualize 'Close-Box' environment in Meta-world with Keypoint3D(Ours)

Train 'Close-Box' environment in Meta-world with baselines

Other environments in general

Update Log

You might also like...

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

ROS-UGV-Control-Interface - Control interface which can be used in any UGV

Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

Comments

Program gets killed while running & Request for CURL baseline

2D keypoint network implementation {Minderer et al. 2019}

Questions and request about CURL implementation

Dependency issues are raised when I try to use Mujoco-210

Owner

Boyuan Chen

A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

Models, datasets and tools for Facial keypoints detection

This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning