Asynchronous Advantage Actor-Critic in PyTorch

Overview

Asynchronous Advantage Actor-Critic in PyTorch

This is PyTorch implementation of A3C as described in Asynchronous Methods for Deep Reinforcement Learning.

Since PyTorch has a easy method to control shared memory within multiprocess, we can easily implement asynchronous method like A3C.

Requirement

  • PyTorch 0.1.6
  • Python 3.5.2
  • gym 0.7.2

Usage

training

python run_a3c.py --atari

In default settings, num_process is 8. Set it as python run_a3c --num_process 4 to fit your number of cpu's cores.

test

After training

python test_a3c.py --render --monitor
Comments
  • Rmsprop statistics sharing problem

    Rmsprop statistics sharing problem

    The rmsprop statistics are created in the optimizer's step function, but that is only called after the optimizer is copied into the worker processes. Because of this, all the copies will create separate statistics for themselves and putting these into shared memory. Moving the state initialization into __init__ should work. Not sure if doing it there is good practice though, since all the pytorch optimizer implementations do state initialization in the step function.

    Take this with a grain of salt, I am new to PyTorch.

    opened by dineNshine 2
  • Problem with relu in policy.py ?

    Problem with relu in policy.py ?

    Hi, thanks for releasing this project,

    I get the following error, when running run_a3c.py.

    [2017-02-19 18:18:07,137] Making new env: Breakout-v0
    Process Process-1:
    Traceback (most recent call last):
      File "/home/ajay/anaconda3/envs/pyphi/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
        self.run()
      File "/home/ajay/anaconda3/envs/pyphi/lib/python3.6/multiprocessing/process.py", line 93, in run
        self._target(*self._args, **self._kwargs)
      File "run_a3c.py", line 38, in train
        p, v = local_policy(Variable(torch.from_numpy(o)).unsqueeze(0))
      File "/home/ajay/anaconda3/envs/pyphi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 210, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/ajay/PythonProjects/pytorch_a3c-master/policy.py", line 61, in forward
        x = F.relu(self.head(x))
      File "/home/ajay/anaconda3/envs/pyphi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 210, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/ajay/anaconda3/envs/pyphi/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 54, in forward
        return self._backend.Linear()(input, self.weight, self.bias)
      File "/home/ajay/anaconda3/envs/pyphi/lib/python3.6/site-packages/torch/nn/_functions/linear.py", line 10, in forward
        output.addmm_(0, 1, input, weight.t())
    TypeError: addmm_ received an invalid combination of arguments - got (int, int, torch.ByteTensor, torch.FloatTensor), but expected one of:
     * (torch.ByteTensor mat1, torch.ByteTensor mat2)
     * (torch.SparseByteTensor mat1, torch.ByteTensor mat2)
     * (int beta, torch.ByteTensor mat1, torch.ByteTensor mat2)
     * (int alpha, torch.ByteTensor mat1, torch.ByteTensor mat2)
     * (int beta, torch.SparseByteTensor mat1, torch.ByteTensor mat2)
     * (int alpha, torch.SparseByteTensor mat1, torch.ByteTensor mat2)
     * (int beta, int alpha, torch.ByteTensor mat1, torch.ByteTensor mat2)
     * (int beta, int alpha, torch.SparseByteTensor mat1, torch.ByteTensor mat2)
    

    Also can you tell me where reinforce() is defined? It's called on line 80 as,

    a.reinforce(r - v.data.squeeze())

    Thanks for your help

    opened by AjayTalati 2
  • How to modify code for continuous actions?

    How to modify code for continuous actions?

    Hi @rarilurelo,

    can I ask if you have been able to modify your code to work with continuous actions - eg pendulum or mountain car? I tired to modify @ikostrikov 's implementation, see here

    https://discuss.pytorch.org/t/continuous-action-a3c/1033

    but could not get it too work? I think @pfre00 has tried too, but he said training was not stable, see here

    https://github.com/pfre00/a3c/issues/1

    Have you got any advice?

    Kind regards,

    Ajay

    opened by AjayTalati 3
  • Asy_optimizer

    Asy_optimizer

    Hi, thanks for this project. I have a question about the optimizer, Is it enough to simply use the original optimizer provided by pytorch? Before opt.step(). if we simply do sth like:

    opt = SGD(shared_para, lr=0.01)
    for l_p, s_p in zip(local_para, shared_para):
         l_p.grad.data = s_p.grad.data
    opt.step()
    

    will this be enough? Thanks!

    opened by ypxie 3
  • expected a Variable arg but got numpy.ndarray error

    expected a Variable arg but got numpy.ndarray error

    I am new to pyTorch, just cloned your codes and ran them, but got an error. I hope you to point me to the right direction to fix this issue.

    More specifics:

    1. used conda env with python 3.6
    2. Ran 'run_a3c.py' with the default Breakout-v0 env till the end and ran 'python test_a3c.py --render --monitor --env Breakout-v0'
    3. got the below error message -

    === File "test_a3c.py", line 71, in test(policy, args) File "test_a3c.py", line 25, in test p, v = policy(o) ... File "/home/john/anaconda3/envs/th/lib/python3.6/site-packages/torch/nn/functional.py", line 37, in conv2d return f(input, weight, bias) if bias is not None else f(input, weight) RuntimeError: expected a Variable argument, but got numpy.ndarray

    Could you tell me what could be the issue(s) here ?

    Many thanks,

    John

    opened by dylanthomas 6
Owner
Reiji Hatsugai
Graduate School of Information Science and Technology at The University of Tokyo
Reiji Hatsugai
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
Advantage Actor Critic (A2C): jax + flax implementation

Advantage Actor Critic (A2C): jax + flax implementation Current version supports only environments with continious action spaces and was tested on muj

Andrey 3 Jan 23, 2022
Using deep actor-critic model to learn best strategies in pair trading

Deep-Reinforcement-Learning-in-Stock-Trading Using deep actor-critic model to learn best strategies in pair trading Abstract Partially observed Markov

null 281 Dec 9, 2022
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

RuanJingqing 8 Sep 30, 2022
Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

advantage-weighted-regression Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning, by Peng et al. (

Omar D. Domingues 1 Dec 2, 2021
MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

MiraiML Mirai: future in japanese. MiraiML is an asynchronous engine for continuous & autonomous machine learning, built for real-time usage. Usage In

Arthur Paulino 25 Jul 27, 2022
Distributed Asynchronous Hyperparameter Optimization better than HyperOpt.

UltraOpt : Distributed Asynchronous Hyperparameter Optimization better than HyperOpt. UltraOpt is a simple and efficient library to minimize expensive

null 98 Aug 16, 2022
Distributed Asynchronous Hyperparameter Optimization in Python

Hyperopt: Distributed Hyperparameter Optimization Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which

null 6.5k Jan 1, 2023
Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

A Differentiable Recurrent Surface for Asynchronous Event-Based Data Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous

Marco Cannici 21 Oct 5, 2022
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

Robotics and Perception Group 69 Dec 26, 2022
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network This repository is the official implementation of Speech Separati

Kai Li (李凯) 116 Nov 9, 2022
DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

What is DeepHyper? DeepHyper is a software package that uses learning, optimization, and parallel computing to automate the design and development of

DeepHyper Team 214 Jan 8, 2023
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 8, 2022
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 520 Dec 30, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

NVIDIA Corporation 6.9k Jan 3, 2023
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

Mayur 119 Nov 24, 2022