Pytorch implementation of Learning with Opponent-Learning Awareness

Alexis David Jacq

Last update: Sep 15, 2022

Related tags

Deep Learning LOLA_DiCE

Overview

LOLA_DiCE

Pytorch implementation of LOLA (https://arxiv.org/abs/1709.04326) using DiCE (https://arxiv.org/abs/1802.05098)

Quick results:

Results on IPD using DiCE

[lr_in=0.3, lr_out=0.2, lr_v=0.1, batch_size=128, len_rollout=150, use_baseline=True]

Results on IPD using DiCE and opponent modelling

[lr_in=0.3, lr_out=0.2, lr_v=0.1, batch_size=128, len_rollout=150, use_baseline=True] (It seems that 2 lookaheads is the most stable model with this set of hyperparameters)

Results on IPD using exact gradients

[lr_in=0.3, lr_out=0.2, batch_size=128, len_rollout=150]

Results on IPD using exact gradients and opponent modelling

[lr_in=0.3, lr_out=0.2, batch_size=128, len_rollout=150]

Authors version:

The authors of the paper have their own version (Tensorflow) available here: https://github.com/alshedivat/lola

You might also like...

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

616 Jan 6, 2023

PyTorch implementation of some learning rate schedulers for deep learning researcher.

pytorch-lr-scheduler PyTorch implementation of some learning rate schedulers for deep learning researcher. Usage WarmupReduceLROnPlateauScheduler Visu

59 Dec 8, 2022

Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic

30 Dec 6, 2022

Pytorch Implementation for CVPR2018 Paper: Learning to Compare: Relation Network for Few-Shot Learning

LearningToCompare Pytorch Implementation for Paper: Learning to Compare: Relation Network for Few-Shot Learning Howto download mini-imagenet and make

246 Dec 19, 2022

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

pytorch-a2c-ppo-acktr Update (April 12th, 2021) PPO is great, but Soft Actor Critic can be better for many continuous control tasks. Please check out

3k Jan 9, 2023

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

pytorch-maml This is a PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML): https://arxiv

516 Jan 5, 2023

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

16 Nov 4, 2020

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

3k Dec 31, 2022

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Federated Learning with Non-IID Data This is an implementation of the following paper: Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, Vik

48 Dec 29, 2022

Comments

Guessing the seed hyperparameter

https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE.py#L12-L22 @alexis-jacq Thanks a lot for such a readable implementation! I ran LOLA-DiCE for num_lookaheads = 1 for 300 iterations with different seeds (the rest of the hyperparameters were the same). The ones used in the experiments in this repository correspond to seed = 42. I generated the seed using random.randint(0,100) function and it showed the following results (the figures show lookahead = 0 in the legend, but that is a typo - it is actually lookahead = 1):

seed 42 seed 15 seed 45 seed 64 The seed in each of the experiments is generated using the random.randint(0,100) function.

Following is the result when I simply remove seed as a hyperparameter in https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE.py#L191

It shows that the results are reproducible only with seed = 42. But what if we do not know this? In such a case, I am not able to produce results consistent with the paper. And finding an appropriate seed (out of infinite integers) would make this implementation too sensitive to one hyperparameter (irrespective of the fact that it is theoretically sound). What if I want to implement the same algorithm in a different environment. The DiCE paper, however, shows IPD results across 5 runs converging to the average return of -1 (Page 8 of the DiCE paper). Also, from where does the randomness comes? The parameters theta and values are always initialized to 0s. Rest all operations (other than actions = m.sample() in line 80) are deterministic.

Thanks in advance!

opened by vrn25 5
value.detach() in dice_objective function

Hello. First of all, thank you for sharing this code. It greatly helped me understand the paper deeper! :+1:

I have noticed that the value function is used to reduce the variance in the dice objective computation. Because the loss for the value function is separately computed by line 148 and optimized by value_update function, wouldn't it be needed to detach the value in line 47, dice_objective function (i.e., values = torch.stack(self.values, dim=1) .detach())? Because the value function is not detached and the dice loss includes the computation graph of the value function, I noticed that the value function can be updated via the theta_update function.

Thank you for your time and consideration! :-)

https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE.py#L148

https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE.py#L47

opened by dkkim93 0
Incorrect calls to the game environment

There is a bug in the following lines:

https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE_om.py#L148-L150

https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE_om.py#L161-L163

In both cases, the agent object passes its action in the first position.

This is correct when the call happens in the context of the first agent, however, this is incorrect for the second agent.

opened by jleni 2
Inconsistent results

First of all, thank you for providing such a readable public implementation of an interesting paper!

Unfortunately, I've found (using PyTorch 4.1) that the results for IPD_DiCE.py seem to vary significantly between runs (you define a seed in the Hp() class, but you don't seem to set it anywhere). Out of 3 runs, 2 of them converged to (-1.9, -1.9) reward after 200 updates (the 'all defect' policy), and only one of the runs got down to about (-1.03, -1.28) reward.

Any idea why this could be the case? Are you also experiencing this?

(LOLA with exact gradients in IPD_ex.py does seem to work though!)

opened by ryan-lowe 7

Owner

Alexis David Jacq

GitHub

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

4 Jul 27, 2022

RetinaNet-PyTorch - A RetinaNet Pytorch Implementation on remote sensing images and has the similar mAP result with RetinaNet in MMdetection

?? RetinaNet Horizontal Detector Based PyTorch This is a horizontal detector Ret

13 Nov 19, 2022

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

556 Jan 4, 2023

Pytorch implementation of Learning with Opponent-Learning Awareness

Related tags

Overview

LOLA_DiCE

Quick results:

Results on IPD using DiCE

Results on IPD using DiCE and opponent modelling

Results on IPD using exact gradients

Results on IPD using exact gradients and opponent modelling

Authors version:

You might also like...

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

PyTorch implementation of some learning rate schedulers for deep learning researcher.

Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Pytorch Implementation for CVPR2018 Paper: Learning to Compare: Relation Network for Few-Shot Learning

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Comments

Guessing the seed hyperparameter

value.detach() in dice_objective function

Incorrect calls to the game environment

Inconsistent results

Owner

Alexis David Jacq

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

ALBERT-pytorch-implementation - ALBERT pytorch implementation

An essential implementation of BYOL in PyTorch + PyTorch Lightning

RealFormer-Pytorch Implementation of RealFormer using pytorch

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

A pytorch implementation of Pytorch-Sketch-RNN

PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

RetinaNet-PyTorch - A RetinaNet Pytorch Implementation on remote sensing images and has the similar mAP result with RetinaNet in MMdetection

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch