Codes accompanying the paper "Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning" (NeurIPS 2021 Spotlight

Related tags

Deep Learning ICQ
Overview

Implicit Constraint Q-Learning

This is a pytorch implementation of ICQ on Datasets for Deep Data-Driven Reinforcement Learning (D4RL) and ICQ-MA on SMAC, the corresponding paper of ICQ is Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning.

Requirements

Single-agent:

Please enter the ICQ_mu, ICQ_softmax, ICQ-antmaze_mu and ICQ-antmaze_softmax folders.

Multi-agent:

Please enter the ICQ-MA folder. Then, set up StarCraft II and SMAC:

bash install_sc2.sh

This will download SC2 into the 3rdparty folder and copy the maps necessary to run over.

The requirements.txt file can be used to install the necessary packages into a virtual environment (not recommended).

Quick Start

Single-agent:

$ python3 main.py

Multi-agent:

$ python3 src/main.py --config=offpg_smac --env-config=sc2 with env_args.map_name=3s_vs_3z

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

All results will be stored in the Results folder.

Citing

If you find this open source release useful, please reference in your paper (it is our honor):

@article{yang2021believe,
  title={Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning},
  author={Yang, Yiqin and Ma, Xiaoteng and Li, Chenghao and Zheng, Zewu and Zhang, Qiyuan and Huang, Gao and Yang, Jun and Zhao, Qianchuan},
  journal={arXiv preprint arXiv:2106.03400},
  year={2021}
}

Note

Comments
  • ICQ-MA hangs indefinetly

    ICQ-MA hangs indefinetly

    Good day, thank you for open sourcing your code. I am very excited to experiment with it.

    I ran the following:

    python3 src/main.py --config=offpg_smac --env-config=sc2 with env_args.map_name=3s_vs_3z

    The code seemed to start but then hangs indefinitely. I would really appreciate if you could help me get the code running.

    This is the only output I get in the terminal:

    INFO - pymarl - Started run with ID "2"
    INFO - my_main - Experiment Parameters:
    INFO - my_main - 
    
    {   'action_selector': 'multinomial',
        'agent': 'rnn',
        'agent_output_type': 'pi_logits',
        'batch_size': 16,
        'batch_size_run': 10,
        'buffer_cpu_only': True,
        'buffer_size': 32,
        'checkpoint_path': '',
        'critic_baseline_fn': 'coma',
        'critic_lr': 0.0001,
        'critic_q_fn': 'coma',
        'critic_train_mode': 'seq',
        'critic_train_reps': 1,
        'env': 'sc2',
        'env_args': {   'continuing_episode': False,
                        'debug': False,
                        'difficulty': '7',
                        'game_version': None,
                        'map_name': '3s_vs_3z',
                        'move_amount': 2,
                        'obs_all_health': True,
                        'obs_instead_of_state': False,
                        'obs_last_action': False,
                        'obs_own_health': True,
                        'obs_pathing_grid': False,
                        'obs_terrain_height': False,
                        'obs_timestep_number': False,
                        'replay_dir': '',
                        'replay_prefix': '',
                        'reward_death_value': 10,
                        'reward_defeat': 0,
                        'reward_negative_scale': 0.5,
                        'reward_only_positive': True,
                        'reward_scale': True,
                        'reward_scale_rate': 20,
                        'reward_sparse': False,
                        'reward_win': 200,
                        'seed': 972368347,
                        'state_last_action': False,
                        'state_timestep_number': False,
                        'step_mul': 8},
        'epsilon_anneal_time': 500000,
        'epsilon_finish': 0.05,
        'epsilon_start': 0.5,
        'evaluate': False,
        'gamma': 0.99,
        'grad_norm_clip': 20,
        'label': 'default_label',
        'learner': 'offpg_learner',
        'learner_log_interval': 20000,
        'load_step': 0,
        'local_results_path': 'results',
        'log_interval': 20000,
        'lr': 0.0005,
        'mac': 'basic_mac',
        'mask_before_softmax': False,
        'mixing_embed_dim': 32,
        'name': 'offpg_smac',
        'obs_agent_id': True,
        'obs_last_action': True,
        'off_batch_size': 32,
        'off_buffer_size': 70000,
        'optim_alpha': 0.99,
        'optim_eps': 1e-05,
        'q_nstep': 0,
        'repeat_id': 1,
        'rnn_hidden_dim': 64,
        'runner': 'parallel',
        'runner_log_interval': 20000,
        'save_model': True,
        'save_model_interval': 1000000,
        'save_replay': False,
        'seed': 972368347,
        'step': 5,
        't_max': 10050000,
        'target_update_interval': 600,
        'tb_lambda': 0.93,
        'td_lambda': 0.8,
        'test_greedy': False,
        'test_interval': 20000,
        'test_nepisode': 20,
        'use_cuda': True,
        'use_tensorboard': True}
    

    When I interrupt the terminal with ctrl+c I get the following:

    Process Process-10:
    Process Process-3:
    Process Process-5:
    Process Process-7:
    Process Process-2:
    Process Process-8:
    Process Process-9:
    Process Process-1:
    Process Process-6:
    Process Process-4:
    Traceback (most recent call last):
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
        self.run()
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/multiprocessing/process.py", line 93, in run
        self._target(*self._args, **self._kwargs)
      File "/home/claude/Documents/ICQ/ICQ-MA/src/runners/parallel_runner.py", line 228, in env_worker
        cmd, data = remote.recv()
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/multiprocessing/connection.py", line 250, in recv
        buf = self._recv_bytes()
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
        buf = self._recv(4)
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
        chunk = read(handle, remaining)
    KeyboardInterrupt
    
    Traceback (most recent call last):
      File "src/main.py", line 96, in <module>
        ex.run_commandline(params)
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/sacred/experiment.py", line 250, in run_commandline
        return self.run(cmd_name, config_updates, named_configs, {}, args)
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/sacred/experiment.py", line 199, in run
        run()
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/sacred/run.py", line 229, in __call__
        self.result = self.main_function(*args)
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/sacred/config/captured_function.py", line 48, in captured_function
        result = wrapped(*args, **kwargs)
      File "src/main.py", line 34, in my_main
        run(_run, config, _log)
      File "/home/claude/Documents/ICQ/ICQ-MA/src/run.py", line 48, in run
        run_sequential(args=args, logger=logger)
      File "/home/claude/Documents/ICQ/ICQ-MA/src/run.py", line 101, in run_sequential
        learner.cuda()
      File "/home/claude/Documents/ICQ/ICQ-MA/src/learners/offpg_learner.py", line 195, in cuda
        self.mac.cuda()
      File "/home/claude/Documents/ICQ/ICQ-MA/src/controllers/basic_controller.py", line 72, in cuda
        self.agent.cuda()
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 265, in cuda
        return self._apply(lambda t: t.cuda(device))
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
        module._apply(fn)
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 199, in _apply
        param.data = fn(param.data)
      File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 265, in <lambda>
        return self._apply(lambda t: t.cuda(device))
    
    opened by jcformanek 2
  • 关于ICA-antmaze_mu中ica.py文件中test_bc()函数

    关于ICA-antmaze_mu中ica.py文件中test_bc()函数

    您好,我在拜读您的代码时,遇到一个问题: 在ICA-antmaze_mu中ica.py文件中test_bc()函数中,tensor_o仅包含observation,而在使用self.vae.decode(tensor_o)时,维度应该是self.obs_dim+2,所以导致以下问题: RuntimeError: size mismatch, m1: [1 x 45], m2: [47 x 750] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:961

    请问该怎么解决?

    opened by JinmingM 2
  • The calculation details of the value function in the advantage function

    The calculation details of the value function in the advantage function

    The definition of the advantage function is A(s,a) = Q(s,a) - V(s). It seems that V(s) is not explicitly calculated in the code (here), as

    # V(s), why, isn't this the Q of at the current action?
    pi, logp_pi = self.ac.pi(o)
    q1_pi = self.ac.q1(o, pi)
    q2_pi = self.ac.q2(o, pi)
    v_pi = torch.min(q1_pi, q2_pi)
    
    # Q(s,a)
    q1_old_actions = self.ac.q1(o, data['act'])
    q2_old_actions = self.ac.q2(o, data['act'])
    q_old_actions = torch.min(q1_old_actions, q2_old_actions)
    
    # A(s,a)
    adv_pi = q_old_actions - v_pi
    

    Looking forward to your reply

    opened by TianQi-777 2
  • Tensor in different devices when run ICQ-MA

    Tensor in different devices when run ICQ-MA

    Hi,

    Thanks for providing the code.

    When I run python3 src/main.py --config=offpg_smac --env-config=sc2 with env_args.map_name=3s_vs_3z I got a error about the tensor in different devices, when I used torch==1.1.0 or torch==1.10.

    I notice the off_batch (in run.py line 155) is on the CPU and all the Net is on the GPU. Could you check this problem?

    Thanks!

    opened by 4ever-Rain 0
  • Unable to run with --config=qmix_smac

    Unable to run with --config=qmix_smac

    Hi,

    Thanks for providing the code.

    Other configurations (e.g., --config=qmix_smac) cannot run. I was wondering if you plan to fix these other configurations?

    opened by qizhg 1
  • Questions on ICQ_softmax

    Questions on ICQ_softmax

    Dear author, thanks for making the code available.

    I have two questions regarding ICQ_softmax, where the weight is approximated by the softmax over the minibatch:

    1. Why len(weights), e.g., this line, is needed to scale the softmax distribution?
    2. Why the softmax is performed wrt the TD error in this line, instead of wrt the Q value suggested in the paper?

    Thanks!

    opened by qizhg 1
Owner
Yiqin Yang
null
[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning

SoCo [NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning By Fangyun Wei*, Yue Gao*, Zhirong Wu, Han Hu,

Yue Gao 139 Dec 14, 2022
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Wonyong Jeong 15 Nov 21, 2022
[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

Learning to Compose Visual Relations This is the pytorch codebase for the NeurIPS 2021 Spotlight paper Learning to Compose Visual Relations. Demo Imag

Nan Liu 88 Jan 4, 2023
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 8, 2022
[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

Shengyu Zhao 373 Jan 2, 2023
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
Code for "The Intrinsic Dimension of Images and Its Impact on Learning" - ICLR 2021 Spotlight

dimensions Estimating the instrinsic dimensionality of image datasets Code for: The Intrinsic Dimensionaity of Images and Its Impact On Learning - Phi

Phil Pope 41 Dec 10, 2022
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Learning What To Do by Simulating the Past This repository contains code that implements the Deep Reward Learning by Simulating the Past (Deep RSLP) a

Center for Human-Compatible AI 24 Aug 7, 2021
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
Code accompanying our paper Feature Learning in Infinite-Width Neural Networks

Empirical Experiments in "Feature Learning in Infinite-width Neural Networks" This repo contains code to replicate our experiments (Word2Vec, MAML) in

Edward Hu 37 Dec 14, 2022
Code accompanying the paper "Wasserstein GAN"

Wasserstein GAN Code accompanying the paper "Wasserstein GAN" A few notes The first time running on the LSUN dataset it can take a long time (up to an

null 3.1k Jan 1, 2023
PyTorch code accompanying our paper on Maximum Entropy Generators for Energy-Based Models

Maximum Entropy Generators for Energy-Based Models All experiments have tensorboard visualizations for samples / density / train curves etc. To run th

Rithesh Kumar 135 Oct 27, 2022
Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers.

ConditionalQA Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. Disclaimer This dataset

null 2 Oct 14, 2021
Code accompanying the paper "How Tight Can PAC-Bayes be in the Small Data Regime?"

How Tight Can PAC-Bayes be in the Small Data Regime? This is the code to reproduce all experiments for the following paper: @inproceedings{Foong:2021:

null 5 Dec 21, 2021
Code repository accompanying the paper "On Adversarial Robustness: A Neural Architecture Search perspective"

On Adversarial Robustness: A Neural Architecture Search perspective Preparation: Clone the repository: https://github.com/tdchaitanya/nas-robustness.g

Chaitanya Devaguptapu 4 Nov 10, 2022
This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

null 47 Dec 28, 2022
Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

About Code release for Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (ICLR 2022 Spotlight)

THUML @ Tsinghua University 221 Dec 31, 2022