Author's PyTorch implementation of TD3 for OpenAI gym tasks

Related tags

Deep Learning TD3
Overview

Addressing Function Approximation Error in Actor-Critic Methods

PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3). If you use our code or data please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.2 and Python 3.7.

Usage

The paper results can be reproduced by running:

./run_experiments.sh

Experiments on single environments can be run by calling:

python main.py --env HalfCheetah-v2

Hyper-parameters can be modified with different arguments to main.py. We include an implementation of DDPG (DDPG.py), which is not used in the paper, for easy comparison of hyper-parameters with TD3. This is not the implementation of "Our DDPG" as used in the paper (see OurDDPG.py).

Algorithms which TD3 compares against (PPO, TRPO, ACKTR, DDPG) can be found at OpenAI baselines repository.

Results

Code is no longer exactly representative of the code used in the paper. Minor adjustments to hyperparamters, etc, to improve performance. Learning curves are still the original results found in the paper.

Learning curves found in the paper are found under /learning_curves. Each learning curve are formatted as NumPy arrays of 201 evaluations (201,), where each evaluation corresponds to the average total reward from running the policy for 10 episodes with no exploration. The first evaluation is the randomly initialized policy network (unused in the paper). Evaluations are peformed every 5000 time steps, over a total of 1 million time steps.

Numerical results can be found in the paper, or from the learning curves. Video of the learned agent can be found here.

Bibtex

@inproceedings{fujimoto2018addressing,
  title={Addressing Function Approximation Error in Actor-Critic Methods},
  author={Fujimoto, Scott and Hoof, Herke and Meger, David},
  booktitle={International Conference on Machine Learning},
  pages={1582--1591},
  year={2018}
}
Comments
  • training a loaded model seems to reset model parameters

    training a loaded model seems to reset model parameters

    When loading a pretrained model for further training, the performance during training seems to have resetted with respect to the pretrained model. Bypassing the train function

    policy.train(replay_buffer, args.batch_size,mean_action,scale)

    does result in the performance corresponding to the trained model. policy.train seems to reset parameters somehow, but i can't seem to find how or where.

    opened by timkoning17 5
  • ourddpg performance not good

    ourddpg performance not good

    I run the source code of TD3: ourddpg algorithm and find performance in most games(apart from HalfCheetah-v3_0) is inconsistent with the original paper? Could you illustrate what might be the problem, please? image image

    opened by ChenDRAG 4
  • question about random seeds

    question about random seeds

    hi, I tried several experiments and found that even after setting the same random seed, the results are different each time. This confuses me. After debugging, I found that in the main.py

    action = env.action_space.sample()

    This line will sample different actions every time, so I want to ask whether gym.action_space should also set the same random seed?

    opened by mantle2048 3
  • Calculation method of value estimate

    Calculation method of value estimate

    Thank you for your outstanding work. In your paper, the estimated value and the real value are mentioned. I would like to ask about the specific calculation method. Thank you。

    opened by wayunderfoot 3
  • New TD3 hyperparameters really improve the performance?

    New TD3 hyperparameters really improve the performance?

    Could you confirm that the new hyperparameters for TD3 (i.e. network size from [400, 300] to [256, 256], batch size from 100 to 256, learning rate from 1e-3 to 3e-4) really improve the performance?

    In my experiment, it does not demonstrate a consistent improvement. image

    opened by zuoxingdong 3
  • Minimum and Array-like Actions

    Minimum and Array-like Actions

    The current algorithm assumes env.action_space.low = -env.action_space.high and env.action_space.high[0] = env.action_space.high[i] for all i, which is not the case for all environments.

    opened by beelerchris 2
  • Performance on Humanoid-v2?

    Performance on Humanoid-v2?

    Hi,

    Thanks for your elegant codes. Here I'm asking to confirm how TD3 performs in mujoco Humanoid-v2.

    In SAC paper and also other papers, they reported TD3 almost cannot make any progress on Humanoid-v2 (the learning curve is like y=0). However, we tried to reproduce this phenomenon using your code and found that TD3 could reach 6000+ scores in testing returns in 10M steps (just one seed though).

    Since you didn't report humanoid results in your paper, and you might change hyperparameters to improve the results, I'm wondering which result might be true? We use the latest hyperparams and set start_timesteps=10000 in reference to https://github.com/sfujim/TD3/blob/master/run_experiments.sh#L11

    Thanks!

    opened by twni2016 2
  • What is the reason behind to modify the DDPG implementation?

    What is the reason behind to modify the DDPG implementation?

    I notice that in your paper C. DDPG Network and Hyper-parameter Comparison mention that the DDPG architecture is different from OpenAI Baseline. Is there any specific reason or intuition behind this? Thank you.

    image

    image

    Fujimoto, S., Van Hoof, H., & Meger, D. (2018). Addressing Function Approximation Error in Actor-Critic Methods. 35th International Conference on Machine Learning, ICML 2018, 4, 2587–2601.

    opened by noklam 2
  • Why is there no noise added in DDPG?

    Why is there no noise added in DDPG?

    Thank you very much for your code, which is very clean and effective. But I have a question: DDPG mentioned the use of noise, why didn't you add noise to DDPG?

    opened by Kchu 2
  • Great work. Just a quick question.

    Great work. Just a quick question.

    Great work! Thanks for the codes.

    Just quick questions on your three great contributions (DP, TPN, CDQ), have you ever tried the three on discrete-action-space games (just typical ones like SpaceInvader or KungFu or Seaquest etc.)? And are the three tricks providing improvement as well ? Since in the last paragraph of your paper it is mentioned that the three contributions can also be utilized for discrete-action games.

    Thanks again.

    opened by hohoCode 2
  • what should be target_policy_noise target_noise_clip?

    what should be target_policy_noise target_noise_clip?

    hello and thank you for sharing such a great project I'm trying to solve Humanoid-V3 and I am having a problem to set target_policy_noise and target_noise_clip the default paper for 0.2 and -0.5,0.5 doesn't work for gym Humanoid-V3 because the action space is -.04 to 0.4 best regards

    opened by MohammadAsadolahi 1
  • argparse default type

    argparse default type

    python main.py --expl_noise=0.1 would result in

    Traceback (most recent call last):
      File "main.py", line 122, in <module>
        + np.random.normal(0, max_action * args.expl_noise, size=action_dim)
    TypeError: can't multiply sequence by non-int of type 'float'
    

    argparse takes in 0.1 as str not float, so type needs to be specified for float arguments in main.py.

    opened by hehonglu123 0
  • Added array-like min and max actions

    Added array-like min and max actions

    1. Added a minimum action variable, included it as the lower bound for action clipping, and modified necessary range calculations.
    2. Changed minimum and maximum action variables to be array-like instead of single floating point variables. torch.clamp doesn't support tensor min and max inputs therefore torch.min and torch.max have to be used instead.

    The changes have been tested on Pendulum-v0 and some custom environments with array-like action spaces to ensure clipping and minimum action are handled properly.

    opened by beelerchris 0
Owner
Scott Fujimoto
Scott Fujimoto
A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

gym-mtsim: OpenAI Gym - MetaTrader 5 Simulator MtSim is a simulator for the MetaTrader 5 trading platform alongside an OpenAI Gym environment for rein

Mohammad Amin Haghpanah 184 Dec 31, 2022
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of OpenAI's Finetuned Transformer Language Model This is a PyTorch implementation of the TensorFlow code provided with OpenAI's

Hugging Face 1.4k Jan 5, 2023
Plug-n-Play Reinforcement Learning in Python with OpenAI Gym and JAX

coax is built on top of JAX, but it doesn't have an explicit dependence on the jax python package. The reason is that your version of jaxlib will depend on your CUDA version.

null 128 Dec 27, 2022
Customizable RecSys Simulator for OpenAI Gym

gym-recsys: Customizable RecSys Simulator for OpenAI Gym Installation | How to use | Examples | Citation This package describes an OpenAI Gym interfac

Xingdong Zuo 14 Dec 8, 2022
Deep Q Learning with OpenAI Gym and Pokemon Showdown

pokemon-deep-learning An openAI gym project for pokemon involving deep q learning. Made by myself, Sam Little, and Layton Webber. This code captures g

null 2 Dec 22, 2021
Manipulation OpenAI Gym environments to simulate robots at the STARS lab

Manipulator Learning This repository contains a set of manipulation environments that are compatible with OpenAI Gym and simulated in pybullet. In par

STARS Laboratory 5 Dec 8, 2022
An OpenAI Gym environment for Super Mario Bros

gym-super-mario-bros An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) us

Andrew Stelmach 1 Jan 5, 2022
PyTorch Implementation of the SuRP algorithm by the authors of the AISTATS 2022 paper "An Information-Theoretic Justification for Model Pruning"

PyTorch Implementation of the SuRP algorithm by the authors of the AISTATS 2022 paper "An Information-Theoretic Justification for Model Pruning".

Berivan Isik 8 Dec 8, 2022
Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups

LieTransformer This repository contains the implementation of the LieTransformer used for experiments in the paper LieTransformer: Equivariant self-at

null 35 Oct 18, 2022
gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI Gym toolkit.

Robin Henry 99 Dec 12, 2022
Classic Papers for Beginners and Impact Scope for Authors.

There have been billions of academic papers around the world. However, maybe only 0.0...01% among them are valuable or are worth reading. Since our limited life has never been forever, TopPaper provide a Top Academic Paper Chart for beginners and reseachers to take one step faster.

Qiulin Zhang 228 Dec 18, 2022
Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

DistilBERT-Text-mining-authorship-attribution Dataset used: https://www.kaggle.com/azimulh/tweets-data-for-authorship-attribution-modelling/version/2

null 1 Jan 13, 2022
CL-Gym: Full-Featured PyTorch Library for Continual Learning

CL-Gym: Full-Featured PyTorch Library for Continual Learning CL-Gym is a small yet very flexible library for continual learning research and developme

Iman Mirzadeh 36 Dec 25, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Phil Wang 5k Jan 4, 2023
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP model from scratch in PyTorch. OpenAI has open-sourced some of the code relating to CLIP model but I found it intimidating and it was far from something short and simple. I also came across a good tutorial inspired by CLIP model on Keras code examples and I translated some parts of it into PyTorch to build this tutorial totally with our beloved PyTorch!

Moein Shariatnia 226 Jan 5, 2023
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

Tae-Hwan Jung 775 Jan 8, 2023
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI

Hourglass Transformer - Pytorch (wip) Implementation of Hourglass Transformer, in Pytorch. It will also contain some of my own ideas about how to make

Phil Wang 61 Dec 25, 2022
A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

train-CLIP ?? A PyTorch Lightning solution to training CLIP from scratch. Goal ⚽ Our aim is to create an easy to use Lightning implementation of OpenA

Cade Gordon 396 Dec 30, 2022
Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

Trading Gym Trading Gym is an open-source project for the development of reinforcement learning algorithms in the context of trading. It is currently

Dimitry Foures 535 Nov 15, 2022