Visual Adversarial Imitation Learning using Variational Models (VMAIL)

Related tags

Deep Learning VMAIL
Overview

Visual Adversarial Imitation Learning using Variational Models (VMAIL)

This is the official implementation of the NeurIPS 2021 paper.

Method

VMAIL

VMAIL simultaneously learns a variational dynamics model and trains an on-policy adversarial imitation learning algorithm in the latent space using only model-based rollouts. This allows for stable and sample efficient training, as well as zero-shot imitation learning by transfering the learned dynamics model

Instructions

Get dependencies:

conda env create -f vmail.yml
conda activate vmail
cd robel_claw/robel
pip install -e .

To train agents for each environmnet download the expert data from the provided link and run:

python3 -u vmail.py --logdir .logdir --expert_datadir expert_datadir

The training will generate tensorabord plots and GIFs in the log folder:

tensorboard --logdir ./logdir

Citation

If you find this code useful, please reference in your paper:

@article{rafailov2021visual,
      title={Visual Adversarial Imitation Learning using Variational Models}, 
      author={Rafael Rafailov and Tianhe Yu and Aravind Rajeswaran and Chelsea Finn},
      year={2021},
      journal={Neural Information Processing Systems}
}
You might also like...
This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

IR-GAIL This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation". Dependency The experiments are de

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

This repository contains the data and code for the paper
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

SAAVN - Sound Adversarial Audio-Visual Navigation,ICLR2022 (In PyTorch)

SAAVN SAAVN Code release for paper "Sound Adversarial Audio-Visual Navigation,IC

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

ACE Please find the preliminary version published at BMVC 2020 in the folder BMVC_version, and its extended journal version in Journal_version. Datase

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛
transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

transfer_adv CVPR-2021 AIC-VI: unrestricted Adversarial Attacks on ImageNet CVPR2021 安全AI挑战者计划第六期赛道2:ImageNet无限制对抗攻击 介绍 : 深度神经网络已经在各种视觉识别问题上取得了最先进的性能。

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)
Comments
  • wield dm_control.rl.control.PhysicsError when running vmail dm_control dmc_walker_stand

    wield dm_control.rl.control.PhysicsError when running vmail dm_control dmc_walker_stand

    I changed the config to use walker_stand expert data,

    config.task = 'dmc_walker_stand'
    config.logdir = pathlib.Path('walker_stand_logdir')
    config.model_datadir = pathlib.Path('walker_stand_model_data')
    config.policy_datadir = pathlib.Path('walker_stand_policy_data')
    config.expert_datadir = pathlib.Path('walker_stand_expert')
    

    When I run vmail.py, it aborted after several training steps with the error 'dm_control.rl.control.PhysicsError:Physics state is invalid'.

    [14000] expl_amount 0.3 / model_grad_norm nan / discriminator_norm nan / value_grad_norm 10.5 / actor_grad_norm 0.1 / prior_ent 103.9 / post_ent 1.3 / expert_d nan / policy_d 0.4 / max_policy_d 0.6 / rewards 0.4 / image_loss 122644302965768192.0 / div 75601088086016.0 / model_loss 122719902980112384.0 / expert_loss nan / policy_loss -0.6 / discriminator_loss nan / discriminator_penalty nan / value_loss 1.7 / actor_loss -2.4 / action_ent -92.7 / fps 5.2
    Test episode of length 1000 with return 163.4.
    Start collection.
    Train episode of length 1000 with return 149.5.
    Start evaluation.
    Training for 200 steps.
    [15000] expl_amount 0.3 / model_grad_norm nan / discriminator_norm nan / value_grad_norm 10.7 / actor_grad_norm 0.1 / prior_ent 105.9 / post_ent -8.3 / expert_d 0.6 / policy_d 0.4 / max_policy_d 0.6 / rewards 0.4 / image_loss 37099197358407680.0 / div 8276778418176.0 / model_loss 37107473760387072.0 / expert_loss -502834048.0 / policy_loss -0.5 / discriminator_loss 82440532736287228001048484556832768.0 / discriminator_penalty 82440532736287228001048484556832768.0 / value_loss 1.7 / actor_loss -2.5 / action_ent -94.0 / fps 5.3
    python-BaseException
    WARNING:absl:Unknown warning type Time = 0.0000.
    Traceback (most recent call last):
      File "/home/liu/.miniconda3/envs/vmail/lib/python3.7/contextlib.py", line 119, in __exit__
        next(self.gen)
      File "/home/liu/.miniconda3/envs/vmail/lib/python3.7/site-packages/dm_control/mujoco/engine.py", line 332, in check_invalid_state
        raise _control.PhysicsError(message)
    dm_control.rl.control.PhysicsError: Physics state is invalid. Warning(s) raised: mjWARN_BADCTRL
    
    Process finished with exit code 1
    
    opened by fokx 0
Owner
null
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

STARS Laboratory 8 Sep 14, 2022
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

Jaehyeon Kim 1.7k Jan 8, 2023
Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

null 82 Jan 1, 2023
LBK 20 Dec 2, 2022
Disagreement-Regularized Imitation Learning

Due to a normalization bug the expert trajectories have lower performance than the rl_baseline_zoo reported experts. Please see the following link in

Kianté Brantley 25 Apr 28, 2022
ilpyt: imitation learning library with modular, baseline implementations in Pytorch

ilpyt The imitation learning toolbox (ilpyt) contains modular implementations of common deep imitation learning algorithms in PyTorch, with unified in

The MITRE Corporation 11 Nov 17, 2022
Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Invariant Causal Imitation Learning for Generalizable Policies Ioana Bica, Daniel Jarrett, Mihaela van der Schaar Neural Information Processing System

Ioana Bica 17 Dec 1, 2022
PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching This is the official PyTorch implementation of SMODICE: Versatile Offline I

Jason Ma 14 Aug 30, 2022
Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)

This repo contains code for our paper State-only Imitation with Transition Dynamics Mismatch published at ICLR 2020. The code heavily uses the RL mach

null 20 Sep 8, 2022