ATAC: Adversarially Trained Actor Critic

Related tags

Deep Learning ATAC
Overview

ATAC: Adversarially Trained Actor Critic

Adversarially Trained Actor Critic for Offline Reinforcement Learning by Ching-An Cheng*, Tengyang Xie*, Nan Jiang, and Alekh Agarwal.
https://arxiv.org/abs/2202.02446

Setup

Clone the repository and create a conda environment.

git clone https://github.com/microsoft/ATAC.git
conda create -n atac python=3.8
cd atac

Prerequisite: Install Mujoco

(Optional) Install free mujoco210 for mujoco_py and mujoco211 for dm_control.

> ~/.bashrc source ~/.bashrc">
bash install_mujoco.sh
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin:/usr/lib/nvidia" >> ~/.bashrc
source ~/.bashrc

Install ATAC

conda activate atac
pip install -e .[mujoco210]
# or below, if the original paid mujoco is used.
pip install -e .[mujoco200]

Run ATAC

python scripts/main.py

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

You might also like...
Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

AdvRush Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21) Environmental Set-up Python == 3.6.12, PyTorch =

Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

Jadena Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022. arXiv

Annotate datasets with a semi-trained or fully trained YOLOv5 model

YOLOv5 Auto Annotator Annotate datasets with a semi-trained or fully trained YOLOv5 model Prerequisites Ubuntu =20.04 Python =3.7 System dependencie

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Repository to run object detection on a model trained on an autonomous driving dataset.
Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

Pre-trained NFNets with 99% of the accuracy of the official paper

NFNet Pytorch Implementation This repo contains pretrained NFNet models F0-F6 with high ImageNet accuracy from the paper High-Performance Large-Scale

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Maths from examples - Learning advanced mathematical computations from examples This is the source code and data sets relevant to the paper Learning a

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Comments
  • Difference with Conservative-Q Learning (CQL)

    Difference with Conservative-Q Learning (CQL)

    The relative pessimism (1)(2)(2) proposed in ATAC seems exactly same as the learning objective (3) in [1] . And Algorithm 2 in ATAC looks remarkably similar to the Algorithm 1 in [1] omitting some implementation caveats. Could you explain what is the major difference between ATAC and CQL?

    [1] Kumar et. al. Conservative Q-Learning for Offline Reinforcement Learning. NeurIPS 2020

    opened by emailweixu 5
  • Why training ends at epoch 50?

    Why training ends at epoch 50?

    Hello, I have tried to reproduce ATAC's results in the paper. However, when I run the official codes, the experiment automatically ends at epoch 50. I cannot find where the problem is? Could you give me some help? For example, I have run 'python scripts/main.py -e hopper-medium-expert-v2 --gpu_id 0 --seed 15'. Are there any other hyperparameters that need to be given? @chinganc

    opened by yuxudong20 4
  • Question about D4RL MuJoCo benchmark

    Question about D4RL MuJoCo benchmark

    Thanks for sharing the codes. I have one question. It seems like you are using D4RL v2 (C.2.), and in Table 1 you mention that "the baseline results are from the respective papers". However, some previous papers were using D4RL v0. I believe the buffer quality is varied from v0 to v2 (see TD3BC paper). Thus, the comparison might be biased.

    opened by HYDesmondLiu 4
  • [Bug in win11] when I run

    [Bug in win11] when I run "main.py"

    Hi, thank you very much for providing such a good open source algorithm, but I am having the problems when running "main.py" on windows.

    Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message. No module named 'flow' Warning: FrankaKitchen failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message. No module named 'mujoco' Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message. No module named 'carla' pybullet build time: Nov 5 2022 13:03:11 Traceback (most recent call last): File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\record_writer.py", line 58, in open_file factory = REGISTERED_FACTORIES[prefix] KeyError: '.\exp_data\OfflineATAC_hopper-medium-replay-v2\beta_16_discount_0.99_norm_constraint_100_policy_lr_5e-07_value_lr_0.0005_use_two_qfs_True_fixed_alpha_None_q_eval_mode_0.5_0.5_n_warmstart_steps_100000_seed_0\events.out.tfevents.1667737712.Pavelzzp'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "D:\pythonproject\ATAC-master\scripts\main.py", line 234, in run(**train_kwargs) File "D:\pythonproject\ATAC-master\scripts\main.py", line 197, in run full_score = train_agent(train_func, File "c:\users\pavel\atac\src\atac\garage_tools\rl_utils.py", line 47, in train_agent score = wrapped_train_func(**train_kwargs) File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\garage\experiment\experiment.py", line 368, in call ctxt = self._make_context(self._get_options(*args), **kwargs) File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\garage\experiment\experiment.py", line 329, in _make_context dowel.TensorBoardOutput(log_dir, x_axis=options['x_axis'])) File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\dowel\tensor_board_output.py", line 57, in init self._writer = tbX.SummaryWriter(log_dir, flush_secs=flush_secs) File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\writer.py", line 301, in init self._get_file_writer() File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\writer.py", line 349, in _get_file_writer self.file_writer = FileWriter(logdir=self.logdir, File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\writer.py", line 105, in init self.event_writer = EventFileWriter( File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\event_file_writer.py", line 106, in init self._ev_writer = EventsWriter(os.path.join( File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\event_file_writer.py", line 43, in init self._py_recordio_writer = RecordWriter(self._file_name) File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\record_writer.py", line 179, in init self._writer = open_file(path) File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\record_writer.py", line 61, in open_file return open(path, 'wb') FileNotFoundError: [Errno 2] No such file or directory: '.\exp_data\OfflineATAC_hopper-medium-replay-v2\beta_16_discount_0.99_norm_constraint_100_policy_lr_5e-07_value_lr_0.0005_use_two_qfs_True_fixed_alpha_None_q_eval_mode_0.5_0.5_n_warmstart_steps_100000_seed_0\events.out.tfevents.1667737712.Pavelzzp' Exception ignored in: <function LogOutput.del at 0x0000020BA6A3D9D0>

    Traceback (most recent call last): File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\dowel\logger.py", line 176, in del self.close() File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\dowel\tensor_board_output.py", line 156, in close self._writer.close() AttributeError: 'TensorBoardOutput' object has no attribute '_writer'

    How can I solve the problems? Thank you very much.

    opened by PavelZhao 2
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Asynchronous Advantage Actor-Critic in PyTorch

Asynchronous Advantage Actor-Critic in PyTorch This is PyTorch implementation of A3C as described in Asynchronous Methods for Deep Reinforcement Learn

Reiji Hatsugai 38 Dec 12, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 8, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
Advantage Actor Critic (A2C): jax + flax implementation

Advantage Actor Critic (A2C): jax + flax implementation Current version supports only environments with continious action spaces and was tested on muj

Andrey 3 Jan 23, 2022
Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

RuanJingqing 8 Sep 30, 2022
Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Representation Robustness Evaluations Our implementation is based on code from MadryLab's robustness package and Devon Hjelm's Deep InfoMax. For all t

Sicheng 19 Dec 7, 2022
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

Jungbeom Lee 110 Dec 7, 2022
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 2, 2023
[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks Code for NeurIPS 2021 Paper "Exploring Architectural Ingredients of A

Hanxun Huang 26 Dec 1, 2022