Implementation of algorithms for continuous control (DDPG and NAF).

Overview

DEPRECATION

This repository is deprecated and is no longer maintaned. Please see a more recent implementation of RL for continuous control at jax-sac.

Description

Reimplementation of Continuous Deep Q-Learning with Model-based Acceleration and Continuous control with deep reinforcement learning.

Contributions are welcome. If you know how to make it more stable, don't hesitate to send a pull request.

Run

Use the default hyperparameters.

For NAF:

python main.py --algo NAF --env-name HalfCheetah-v2

For DDPG

python main.py --algo DDPG --env-name HalfCheetah-v2
Comments
  • Having trouble running code with HalfCheetah

    Having trouble running code with HalfCheetah

    I tried running the code with HalfCheetah and commented out the wrappers.Monitor(...) line and any line that rendered the result. I get the error:

    Traceback (most recent call last): File "main.py", line 98, in agent.update_parameters(batch) File "/home/sbhupatiraju/pytorch-ddpg-naf/naf.py", line 126, in update_parameters loss.backward() File "/home/sbhupatiraju/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 156, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables) File "/home/sbhupatiraju/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 98, in backward variables, grad_variables, retain_graph) RuntimeError: element 0 of variables tuple is volatile

    Any idea on what might be going on?

    opened by suryabhupa 9
  • Add parameter noise to ddpg

    Add parameter noise to ddpg

    I implemented parameter noise for DDPG, as described here: https://blog.openai.com/better-exploration-with-parameter-noise/

    I found that the trained policies resulting from DDPG with OU noise were similar to the one shown in the blog post above; the agent flipped onto its head and scooted forward, with the reward plateauing around 1000. With parameter noise, the learned policy looked more like running and ended with rewards of 3000+.

    I modified the policy network architecture for DDPG to include layernorm, which is necessary for using parameter noise.

    Parameter noise is not implemented yet for NAF, because I am not too familiar with the algorithm. The only change in the NAF code is an extra argument to select_action to make the interface the same as DDPG’s.

    I also added utility functions for saving and loading networks.

    To-do: decay parameter noise like how it’s done for OU noise

    opened by kazuotani14 3
  • Typo in the Implementation

    Typo in the Implementation

    https://github.com/ikostrikov/pytorch-ddpg-naf/blob/d2e587a741e5220dfc6eceb7896ba7d6fe5c2630/naf.py#L121 current Target: r_t + \gamma * mask + v_{t+1} correct Target: r_t + \gamma * mask * v_{t+1}

    opened by Akella17 1
  • pytorch-rl

    pytorch-rl

    Implemented DDPG and tried to make it as close as possible to your implementation of NAF.

    I've also made some changes in NAF (seems to work better now, at least for Pendulum-v0).

    (Both algorithms seem to give similar results for Pendulum-v0 but NAF has a bit more variance compared to DDPG).

    ddpg

    naf

    opened by pranz24 1
  • Error out of memory

    Error out of memory

    Hi,

    I am doing some work about RL, and very interested in the two algorithms. I have tried to train your models both on CPU and GPU, however, both outputted "out of memory" error. The memory in use was keeping increasing. It seems that the data and/or the model in former steps are not released . And the code is very similar to the example, as follows:

        action = agent.select_action(state, ounoise, param_noise)
        next_state, reward, done, info = env.step(action.cpu().numpy()[0])
        total_numsteps += 1
        episode_reward += reward
    
        action = torch.Tensor(action.cpu())
        mask = torch.Tensor([not done])
        next_state = torch.Tensor(next_state.cpu())
        reward = torch.Tensor([reward])
    
        # pdb.set_trace()
        memory.push(state, action, mask, next_state, reward)
        state = next_state
    
        if len(memory) > args.batch_size:
            for _ in range(args.updates_per_step):
                transitions = memory.sample(args.batch_size)
                batch = Transition(*zip(*transitions))
    
                value_loss, policy_loss = agent.update_parameters(batch)
    
                writer.add_scalar('loss/value', value_loss, updates)
                writer.add_scalar('loss/policy', policy_loss, updates)
    
                updates += 1
    

    Would you please help to solve the problem? Thanks in advance

    opened by bo-wu 0
  • actions typo?

    actions typo?

    https://github.com/ikostrikov/pytorch-ddpg-naf/blob/7870655178fe93a4eba5450257323f00ff8a227f/normalized_actions.py#L16

    Should this return action, not actions?

    opened by horacepan 0
  • NAF Implementation not working!

    NAF Implementation not working!

    The NAF algorithm does not work on Pendulum or any of the PyBullet environments. @ikostrikov Do you have any guesses why that might be the case? Which environments did you experiment with this code on? In case you used different hyperparameters than the default values, could you mention the changes that need to be made to get the NAF algorithm working.

    opened by Akella17 2
  • t was not defined, I assumed to be equal to batch size and computing …

    t was not defined, I assumed to be equal to batch size and computing …

    For adaptive noise estimation, need to get some states for the expectation operator and compute the ddpg distance metric between perturbed and non perturbed action but getting samples index "t" was not defined, assuming it is batch size we can now compute it

    opened by gsp-27 0
  • fixed NAF reward discount

    fixed NAF reward discount

    Hi Ilya,

    Thanks for your open-source implementation of DDPG/NAF in pytorch.

    We spotted a typo in NAF: the discount factor (and the done mask) should multiply the next_state_values instead of adding up to it.

    Cheers, Jacques

    opened by jackokaiser 0
  • benchmarking the repo

    benchmarking the repo

    Hi @ikostrikov , I appreciate your implementation, and I wonder if you've benchmarked your implementation? If so, can I have some roughly results. Many thanks!

    opened by andrewliao11 1
  • AttributeError for gradient clipping

    AttributeError for gradient clipping

    hi ikostrikov, I got this error when running your code

    Traceback (most recent call last):
      File "main.py", line 89, in <module>
        agent.update_parameters(batch)
      File "/home/andrewliao11/Work/pytorch-naf/naf.py", line 121, in update_parameters
        param.grad.data.clamp(-1, 1)
    AttributeError: 'NoneType' object has no attribute 'data'
    

    the original code is:

    for param in self.model.parameters():
                param.grad.data.clamp(-1, 1)
    

    maybe we should modify in into:

    torch.nn.utils.clip_grad_norm(self.model.parameters(), 1)
    

    I'm just a newbie to pytorch, not sure if it's right, thx!

    opened by andrewliao11 0
Owner
Ilya Kostrikov
Post doc
Ilya Kostrikov
MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

page_type languages products description sample python azure azure-machine-learning-service azure-devops Code which demonstrates how to set up and ope

null 1 Nov 1, 2021
[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

MIT Graphics Group 65 Jan 7, 2023
ROS-UGV-Control-Interface - Control interface which can be used in any UGV

ROS-UGV-Control-Interface Cam Closed: Cam Opened:

Ahmet Fatih Akcan 1 Nov 4, 2022
Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.

Hand Gesture Volume Control Modules There are basically three modules Handtracking Program Handtracking Module Volume Control Program Handtracking Pro

VITTAL 1 Jan 12, 2022
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
A clean and robust Pytorch implementation of PPO on continuous action space.

PPO-Continuous-Pytorch I found the current implementation of PPO on continuous action space is whether somewhat complicated or not stable. And this is

XinJingHao 56 Dec 16, 2022
A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

張致強 14 Dec 2, 2022
PyTorch implementation for MINE: Continuous-Depth MPI with Neural Radiance Fields

MINE: Continuous-Depth MPI with Neural Radiance Fields Project Page | Video PyTorch implementation for our ICCV 2021 paper. MINE: Towards Continuous D

Zijian Feng 325 Dec 29, 2022
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

MiraiML Mirai: future in japanese. MiraiML is an asynchronous engine for continuous & autonomous machine learning, built for real-time usage. Usage In

Arthur Paulino 25 Jul 27, 2022
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

Sohil Shah 197 Nov 29, 2022
SurfEmb (CVPR 2022) - SurfEmb: Dense and Continuous Correspondence Distributions

SurfEmb SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation with Learnt Surface Embeddings Rasmus Laurvig Haugard, A

Rasmus Haugaard 56 Nov 19, 2022
Learning Continuous Image Representation with Local Implicit Image Function

LIIF This repository contains the official implementation for LIIF introduced in the following paper: Learning Continuous Image Representation with Lo

Yinbo Chen 1k Dec 25, 2022
[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space by Quande Liu, Cheng Chen, Ji

Quande Liu 178 Jan 6, 2023
Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

Nicholas Monath 35 Nov 16, 2022
Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

Continuous Query Decomposition This repository contains the official implementation for our ICLR 2021 (Oral) paper, Complex Query Answering with Neura

UCL Natural Language Processing 71 Dec 29, 2022
A tight inclusion function for continuous collision detection

Tight-Inclusion Continuous Collision Detection A conservative Continuous Collision Detection (CCD) method with support for minimum separation. You can

Continuous Collision Detection 89 Jan 1, 2023
On the model-based stochastic value gradient for continuous reinforcement learning

On the model-based stochastic value gradient for continuous reinforcement learning This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, a

Facebook Research 46 Dec 15, 2022