Visual Adversarial Imitation Learning using Variational Models (VMAIL)

Last update: Nov 18, 2022

Related tags

Deep Learning VMAIL

Overview

Visual Adversarial Imitation Learning using Variational Models (VMAIL)

This is the official implementation of the NeurIPS 2021 paper.

Method

VMAIL simultaneously learns a variational dynamics model and trains an on-policy adversarial imitation learning algorithm in the latent space using only model-based rollouts. This allows for stable and sample efficient training, as well as zero-shot imitation learning by transfering the learned dynamics model

Instructions

Get dependencies:

conda env create -f vmail.yml
conda activate vmail
cd robel_claw/robel
pip install -e .

To train agents for each environmnet download the expert data from the provided link and run:

python3 -u vmail.py --logdir .logdir --expert_datadir expert_datadir

The training will generate tensorabord plots and GIFs in the log folder:

tensorboard --logdir ./logdir

Citation

If you find this code useful, please reference in your paper:

@article{rafailov2021visual,
      title={Visual Adversarial Imitation Learning using Variational Models}, 
      author={Rafael Rafailov and Tianhe Yu and Aravind Rajeswaran and Chelsea Finn},
      year={2021},
      journal={Neural Information Processing Systems}
}

Comments

wield dm_control.rl.control.PhysicsError when running vmail dm_control dmc_walker_stand

I changed the config to use walker_stand expert data,

config.task = 'dmc_walker_stand'
config.logdir = pathlib.Path('walker_stand_logdir')
config.model_datadir = pathlib.Path('walker_stand_model_data')
config.policy_datadir = pathlib.Path('walker_stand_policy_data')
config.expert_datadir = pathlib.Path('walker_stand_expert')

When I run vmail.py, it aborted after several training steps with the error 'dm_control.rl.control.PhysicsError:Physics state is invalid'.

[14000] expl_amount 0.3 / model_grad_norm nan / discriminator_norm nan / value_grad_norm 10.5 / actor_grad_norm 0.1 / prior_ent 103.9 / post_ent 1.3 / expert_d nan / policy_d 0.4 / max_policy_d 0.6 / rewards 0.4 / image_loss 122644302965768192.0 / div 75601088086016.0 / model_loss 122719902980112384.0 / expert_loss nan / policy_loss -0.6 / discriminator_loss nan / discriminator_penalty nan / value_loss 1.7 / actor_loss -2.4 / action_ent -92.7 / fps 5.2
Test episode of length 1000 with return 163.4.
Start collection.
Train episode of length 1000 with return 149.5.
Start evaluation.
Training for 200 steps.
[15000] expl_amount 0.3 / model_grad_norm nan / discriminator_norm nan / value_grad_norm 10.7 / actor_grad_norm 0.1 / prior_ent 105.9 / post_ent -8.3 / expert_d 0.6 / policy_d 0.4 / max_policy_d 0.6 / rewards 0.4 / image_loss 37099197358407680.0 / div 8276778418176.0 / model_loss 37107473760387072.0 / expert_loss -502834048.0 / policy_loss -0.5 / discriminator_loss 82440532736287228001048484556832768.0 / discriminator_penalty 82440532736287228001048484556832768.0 / value_loss 1.7 / actor_loss -2.5 / action_ent -94.0 / fps 5.3
python-BaseException
WARNING:absl:Unknown warning type Time = 0.0000.
Traceback (most recent call last):
  File "/home/liu/.miniconda3/envs/vmail/lib/python3.7/contextlib.py", line 119, in __exit__
    next(self.gen)
  File "/home/liu/.miniconda3/envs/vmail/lib/python3.7/site-packages/dm_control/mujoco/engine.py", line 332, in check_invalid_state
    raise _control.PhysicsError(message)
dm_control.rl.control.PhysicsError: Physics state is invalid. Warning(s) raised: mjWARN_BADCTRL

Process finished with exit code 1

opened by fokx 0

This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

IR-GAIL This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation". Dependency The experiments are de

1 Jul 14, 2022

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

18 Dec 29, 2022

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

NeurIPS 2021 Title: Distilling Robust and Non-Robust Features in Adversarial Exa

35 Dec 26, 2022

Visual Adversarial Imitation Learning using Variational Models (VMAIL)

Related tags

Overview

Visual Adversarial Imitation Learning using Variational Models (VMAIL)

Method

Instructions

Citation

You might also like...

This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

SAAVN - Sound Adversarial Audio-Visual Navigation,ICLR2022 (In PyTorch)

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Comments

wield dm_control.rl.control.PhysicsError when running vmail dm_control dmc_walker_stand

Owner

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

Hierarchical-Bayesian-Defense - Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical Variational Inference (Openreview)

Disagreement-Regularized Imitation Learning

ilpyt: imitation learning library with modular, baseline implementations in Pytorch

Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)