A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Phil Tabor

Last update: Dec 28, 2022

Related tags

Deep Learning reinforcement-learning deep-reinforcement-learning actor-critic-methods actor-critic-algorithm multi-agent-reinforcement-learning maddpg

Overview

Multi-Agent-Deep-Deterministic-Policy-Gradients

A Pytorch implementation of the multi agent deep deterministic policy gradients(MADDPG) algorithm

This is my implementation of the algorithm presented in the paper: Multi Agent Actor Critic for Mixed Cooperative-Competitive Environments. You can find this paper here: https://arxiv.org/pdf/1706.02275.pdf

You will need to install the Multi Agent Particle Environment(MAPE), which you can find here: https://github.com/openai/multiagent-particle-envs

Make sure to create a virtual environment with the dependencies for the MAPE, since they are somewhat out of date. I also recommend running this with PyTorch version 1.4.0, as the latest version (1.8) seems to have an issue with an in place operation I use in the calculation of the critic loss.

It's probably easiest to just clone this repo into the same directory as the MAPE, as the main file requires the make_env function from that package.

The video for this tutorial is found here: https://youtu.be/tZTQ6S9PfkE

Comments

Question about backward

Hi Phil,

I have watched your video on Youtube. There's still a question about the critic_loss.backward(retain_graph=True). In your solution, you just turn the torch version from 1.8.1 to 1.4, I think it's a bug in version 1.4 and so that you are running bug-free in version 1.4.

I have checked a lot of information but I still don't know how to solve it. So here I am to turn to you. Here is my Traceback:

File "main.py", line 101, in <module>
    maddpg_agent.learn(memory)
  File "maddpg.py", line 99, in learn
    critic_loss.backward(retain_graph=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 8]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

opened by louieworth 11

Found dtype Float but expected Double

Hello dear, i'm trying to run the code (after your correction on backward) but i'm getting the following error: I also tried with python 3.6 (numpy 1.14.5 , torch 1.10.1, gym 0.10.5), i'm still getting the same error

critic_loss.backward(retain_graph=True)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Found dtype Float but expected Double

it seems like the error is being raised from the method MADDPG.learn at : critic_loss.backward(retain_graph=True) i checked the variable of "target" variable and has dtype=torch.float64 Any idea ? thanks

opened by jdtotow 1

the network parameters about target critic and critic network

self.target_critic.load_state_dict(critic_state_dict) above code seems make target critic network's parameter always be same as the critic network's. So what is the purpose? making the network learn more slowly？ Hope somebody help me!

opened by wallcuber 1
usage of critic_value_new[dones[:, 0.0]] = 0.0 in learn()

https://github.com/philtabor/Multi-Agent-Deep-Deterministic-Policy-Gradients/blob/a3c294aa6834f348a7401306dff3e67919c861f5/maddpg.py#L74

Hi Phill,

Could you please help me to understand what's this line is for? critic_value_new[dones[:, 0.0]] = 0.0 Since critic_value_new float variable it cannot be used as array. Should we set just dones[agent_idx] to 0?

Thanks and Regards Viji

opened by VijiKK 1
Shouldn't it be agent.actor.forward() and calculate actor_loss?

https://github.com/philtabor/Multi-Agent-Deep-Deterministic-Policy-Gradients/blob/a3c294aa6834f348a7401306dff3e67919c861f5/maddpg.py#L83

Dear Phill,

First of all plenty of thanks and gratitude for your lessens, I've learned a lot from your lectures. I've noticed a difference in the code at line 83 in MADDPG class while calculating actor-loss. It's running forward propagation of critic network instead of actor network. I believe this is typo, please correct me if I'm wrong.

Thanks and Regards Viji

opened by VijiKK 0

A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Related tags

Overview

Multi-Agent-Deep-Deterministic-Policy-Gradients

Comments

Question about backward

Found dtype Float but expected Double

the network parameters about target critic and critic network

usage of critic_value_new[dones[:, 0.0]] = 0.0 in learn()

Shouldn't it be agent.actor.forward() and calculate actor_loss?

Owner

Phil Tabor

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

Multi-agent reinforcement learning algorithm and environment

This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

An implementation of the proximal policy optimization algorithm

Implement Decoupled Neural Interfaces using Synthetic Gradients in Pytorch

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

Code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

A multi-entity Transformer for multi-agent spatiotemporal modeling.