Pytorch implementation of Learning with Opponent-Learning Awareness

Overview

LOLA_DiCE

Pytorch implementation of LOLA (https://arxiv.org/abs/1709.04326) using DiCE (https://arxiv.org/abs/1802.05098)

Quick results:

Results on IPD using DiCE

[lr_in=0.3, lr_out=0.2, lr_v=0.1, batch_size=128, len_rollout=150, use_baseline=True] ipd_with_dice

Results on IPD using DiCE and opponent modelling

[lr_in=0.3, lr_out=0.2, lr_v=0.1, batch_size=128, len_rollout=150, use_baseline=True] ipd_with_dice (It seems that 2 lookaheads is the most stable model with this set of hyperparameters)

Results on IPD using exact gradients

[lr_in=0.3, lr_out=0.2, batch_size=128, len_rollout=150] ipd_with_exact_grads

Results on IPD using exact gradients and opponent modelling

[lr_in=0.3, lr_out=0.2, batch_size=128, len_rollout=150] ipd_with_exact_grads

Authors version:

The authors of the paper have their own version (Tensorflow) available here: https://github.com/alshedivat/lola

You might also like...
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

PyTorch implementation of some learning rate schedulers for deep learning researcher.
PyTorch implementation of some learning rate schedulers for deep learning researcher.

pytorch-lr-scheduler PyTorch implementation of some learning rate schedulers for deep learning researcher. Usage WarmupReduceLROnPlateauScheduler Visu

Official PyTorch implementation of
Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic

Pytorch Implementation for CVPR2018 Paper: Learning to Compare: Relation Network for Few-Shot Learning

LearningToCompare Pytorch Implementation for Paper: Learning to Compare: Relation Network for Few-Shot Learning Howto download mini-imagenet and make

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL). PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)
PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

pytorch-maml This is a PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML): https://arxiv

pytorch implementation of
pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.
PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Federated Learning with Non-IID Data This is an implementation of the following paper: Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, Vik

Comments
  • Guessing the seed hyperparameter

    Guessing the seed hyperparameter

    https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE.py#L12-L22 @alexis-jacq Thanks a lot for such a readable implementation! I ran LOLA-DiCE for num_lookaheads = 1 for 300 iterations with different seeds (the rest of the hyperparameters were the same). The ones used in the experiments in this repository correspond to seed = 42. I generated the seed using random.randint(0,100) function and it showed the following results (the figures show lookahead = 0 in the legend, but that is a typo - it is actually lookahead = 1):

    seed 42 seed42 seed 15 seed15 seed 45 seed45 seed 64 seed64 The seed in each of the experiments is generated using the random.randint(0,100) function.

    Following is the result when I simply remove seed as a hyperparameter in https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE.py#L191 noseed

    It shows that the results are reproducible only with seed = 42. But what if we do not know this? In such a case, I am not able to produce results consistent with the paper. And finding an appropriate seed (out of infinite integers) would make this implementation too sensitive to one hyperparameter (irrespective of the fact that it is theoretically sound). What if I want to implement the same algorithm in a different environment. The DiCE paper, however, shows IPD results across 5 runs converging to the average return of -1 (Page 8 of the DiCE paper). Also, from where does the randomness comes? The parameters theta and values are always initialized to 0s. Rest all operations (other than actions = m.sample() in line 80) are deterministic.

    Thanks in advance!

    opened by vrn25 5
  • value.detach() in dice_objective function

    value.detach() in dice_objective function

    Hello. First of all, thank you for sharing this code. It greatly helped me understand the paper deeper! :+1:

    I have noticed that the value function is used to reduce the variance in the dice objective computation. Because the loss for the value function is separately computed by line 148 and optimized by value_update function, wouldn't it be needed to detach the value in line 47, dice_objective function (i.e., values = torch.stack(self.values, dim=1) .detach())? Because the value function is not detached and the dice loss includes the computation graph of the value function, I noticed that the value function can be updated via the theta_update function.

    Thank you for your time and consideration! :-)

    https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE.py#L148

    https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE.py#L47

    opened by dkkim93 0
  • Incorrect calls to the game environment

    Incorrect calls to the game environment

    There is a bug in the following lines:

    https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE_om.py#L148-L150

    https://github.com/alexis-jacq/LOLA_DiCE/blob/ec3f3f620a67df0a3e72f8e9227d09e4543dbb99/ipd_DiCE_om.py#L161-L163

    In both cases, the agent object passes its action in the first position.

    This is correct when the call happens in the context of the first agent, however, this is incorrect for the second agent.

    opened by jleni 2
  • Inconsistent results

    Inconsistent results

    First of all, thank you for providing such a readable public implementation of an interesting paper!

    Unfortunately, I've found (using PyTorch 4.1) that the results for IPD_DiCE.py seem to vary significantly between runs (you define a seed in the Hp() class, but you don't seem to set it anywhere). Out of 3 runs, 2 of them converged to (-1.9, -1.9) reward after 200 updates (the 'all defect' policy), and only one of the runs got down to about (-1.03, -1.28) reward.

    Any idea why this could be the case? Are you also experiencing this?

    (LOLA with exact gradients in IPD_ex.py does seem to work though!)

    opened by ryan-lowe 7
Owner
Alexis David Jacq
Alexis David Jacq
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

BG Kim 3 Oct 6, 2022
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 8, 2022
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 8, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 7, 2023
Fang Zhonghao 13 Nov 19, 2022
RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

Phil Wang 556 Jan 4, 2023