Codes accompanying the paper "Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning" (NeurIPS 2021 Spotlight

Last update: Dec 23, 2022

Related tags

Deep Learning ICQ

Overview

Implicit Constraint Q-Learning

This is a pytorch implementation of ICQ on Datasets for Deep Data-Driven Reinforcement Learning (D4RL) and ICQ-MA on SMAC, the corresponding paper of ICQ is Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning.

Requirements

Single-agent:

Please enter the ICQ_mu, ICQ_softmax, ICQ-antmaze_mu and ICQ-antmaze_softmax folders.

python=3.6.5
Datasets for Deep Data-Driven Reinforcement Learning (D4RL)
torch=1.1.0

Multi-agent:

Please enter the ICQ-MA folder. Then, set up StarCraft II and SMAC:

bash install_sc2.sh

This will download SC2 into the 3rdparty folder and copy the maps necessary to run over.

The requirements.txt file can be used to install the necessary packages into a virtual environment (not recommended).

Quick Start

Single-agent:

$ python3 main.py

Multi-agent:

$ python3 src/main.py --config=offpg_smac --env-config=sc2 with env_args.map_name=3s_vs_3z

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

All results will be stored in the Results folder.

Citing

If you find this open source release useful, please reference in your paper (it is our honor):

@article{yang2021believe,
  title={Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning},
  author={Yang, Yiqin and Ma, Xiaoteng and Li, Chenghao and Zheng, Zewu and Zhang, Qiyuan and Huang, Gao and Yang, Jun and Zhao, Qianchuan},
  journal={arXiv preprint arXiv:2106.03400},
  year={2021}
}

Note

If you have any questions, please contact me: [email protected].
The implementation is based on PyMARL, SMAC codebases and DOP which are open-sourced.

Comments

ICQ-MA hangs indefinetly

Good day, thank you for open sourcing your code. I am very excited to experiment with it.

I ran the following:

python3 src/main.py --config=offpg_smac --env-config=sc2 with env_args.map_name=3s_vs_3z

The code seemed to start but then hangs indefinitely. I would really appreciate if you could help me get the code running.

This is the only output I get in the terminal:

INFO - pymarl - Started run with ID "2"
INFO - my_main - Experiment Parameters:
INFO - my_main - 

{   'action_selector': 'multinomial',
    'agent': 'rnn',
    'agent_output_type': 'pi_logits',
    'batch_size': 16,
    'batch_size_run': 10,
    'buffer_cpu_only': True,
    'buffer_size': 32,
    'checkpoint_path': '',
    'critic_baseline_fn': 'coma',
    'critic_lr': 0.0001,
    'critic_q_fn': 'coma',
    'critic_train_mode': 'seq',
    'critic_train_reps': 1,
    'env': 'sc2',
    'env_args': {   'continuing_episode': False,
                    'debug': False,
                    'difficulty': '7',
                    'game_version': None,
                    'map_name': '3s_vs_3z',
                    'move_amount': 2,
                    'obs_all_health': True,
                    'obs_instead_of_state': False,
                    'obs_last_action': False,
                    'obs_own_health': True,
                    'obs_pathing_grid': False,
                    'obs_terrain_height': False,
                    'obs_timestep_number': False,
                    'replay_dir': '',
                    'replay_prefix': '',
                    'reward_death_value': 10,
                    'reward_defeat': 0,
                    'reward_negative_scale': 0.5,
                    'reward_only_positive': True,
                    'reward_scale': True,
                    'reward_scale_rate': 20,
                    'reward_sparse': False,
                    'reward_win': 200,
                    'seed': 972368347,
                    'state_last_action': False,
                    'state_timestep_number': False,
                    'step_mul': 8},
    'epsilon_anneal_time': 500000,
    'epsilon_finish': 0.05,
    'epsilon_start': 0.5,
    'evaluate': False,
    'gamma': 0.99,
    'grad_norm_clip': 20,
    'label': 'default_label',
    'learner': 'offpg_learner',
    'learner_log_interval': 20000,
    'load_step': 0,
    'local_results_path': 'results',
    'log_interval': 20000,
    'lr': 0.0005,
    'mac': 'basic_mac',
    'mask_before_softmax': False,
    'mixing_embed_dim': 32,
    'name': 'offpg_smac',
    'obs_agent_id': True,
    'obs_last_action': True,
    'off_batch_size': 32,
    'off_buffer_size': 70000,
    'optim_alpha': 0.99,
    'optim_eps': 1e-05,
    'q_nstep': 0,
    'repeat_id': 1,
    'rnn_hidden_dim': 64,
    'runner': 'parallel',
    'runner_log_interval': 20000,
    'save_model': True,
    'save_model_interval': 1000000,
    'save_replay': False,
    'seed': 972368347,
    'step': 5,
    't_max': 10050000,
    'target_update_interval': 600,
    'tb_lambda': 0.93,
    'td_lambda': 0.8,
    'test_greedy': False,
    'test_interval': 20000,
    'test_nepisode': 20,
    'use_cuda': True,
    'use_tensorboard': True}

When I interrupt the terminal with ctrl+c I get the following:

Process Process-10:
Process Process-3:
Process Process-5:
Process Process-7:
Process Process-2:
Process Process-8:
Process Process-9:
Process Process-1:
Process Process-6:
Process Process-4:
Traceback (most recent call last):
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/claude/Documents/ICQ/ICQ-MA/src/runners/parallel_runner.py", line 228, in env_worker
    cmd, data = remote.recv()
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

Traceback (most recent call last):
  File "src/main.py", line 96, in <module>
    ex.run_commandline(params)
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/sacred/experiment.py", line 250, in run_commandline
    return self.run(cmd_name, config_updates, named_configs, {}, args)
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/sacred/experiment.py", line 199, in run
    run()
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/sacred/run.py", line 229, in __call__
    self.result = self.main_function(*args)
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/sacred/config/captured_function.py", line 48, in captured_function
    result = wrapped(*args, **kwargs)
  File "src/main.py", line 34, in my_main
    run(_run, config, _log)
  File "/home/claude/Documents/ICQ/ICQ-MA/src/run.py", line 48, in run
    run_sequential(args=args, logger=logger)
  File "/home/claude/Documents/ICQ/ICQ-MA/src/run.py", line 101, in run_sequential
    learner.cuda()
  File "/home/claude/Documents/ICQ/ICQ-MA/src/learners/offpg_learner.py", line 195, in cuda
    self.mac.cuda()
  File "/home/claude/Documents/ICQ/ICQ-MA/src/controllers/basic_controller.py", line 72, in cuda
    self.agent.cuda()
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 265, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
    module._apply(fn)
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 199, in _apply
    param.data = fn(param.data)
  File "/home/claude/miniconda3/envs/icq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 265, in <lambda>
    return self._apply(lambda t: t.cuda(device))

opened by jcformanek 2

关于ICA-antmaze_mu中ica.py文件中test_bc()函数

您好，我在拜读您的代码时，遇到一个问题：在ICA-antmaze_mu中ica.py文件中test_bc()函数中，tensor_o仅包含observation，而在使用self.vae.decode(tensor_o)时，维度应该是self.obs_dim+2，所以导致以下问题： RuntimeError: size mismatch, m1: [1 x 45], m2: [47 x 750] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:961

请问该怎么解决？

opened by JinmingM 2

The calculation details of the value function in the advantage function

The definition of the advantage function is A(s,a) = Q(s,a) - V(s). It seems that V(s) is not explicitly calculated in the code (here), as

# V(s), why, isn't this the Q of at the current action?
pi, logp_pi = self.ac.pi(o)
q1_pi = self.ac.q1(o, pi)
q2_pi = self.ac.q2(o, pi)
v_pi = torch.min(q1_pi, q2_pi)

# Q(s,a)
q1_old_actions = self.ac.q1(o, data['act'])
q2_old_actions = self.ac.q2(o, data['act'])
q_old_actions = torch.min(q1_old_actions, q2_old_actions)

# A(s,a)
adv_pi = q_old_actions - v_pi

Looking forward to your reply

opened by TianQi-777 2

Tensor in different devices when run ICQ-MA

Hi，

Thanks for providing the code.

When I run python3 src/main.py --config=offpg_smac --env-config=sc2 with env_args.map_name=3s_vs_3z I got a error about the tensor in different devices, when I used torch==1.1.0 or torch==1.10.

I notice the off_batch （in run.py line 155) is on the CPU and all the Net is on the GPU. Could you check this problem?

Thanks!

opened by 4ever-Rain 0
Unable to run with --config=qmix_smac

Hi,

Thanks for providing the code.

Other configurations (e.g., --config=qmix_smac) cannot run. I was wondering if you plan to fix these other configurations?

opened by qizhg 1
Questions on ICQ_softmax
Dear author, thanks for making the code available.

I have two questions regarding ICQ_softmax, where the weight is approximated by the softmax over the minibatch:

Why len(weights), e.g., this line, is needed to scale the softmax distribution?

Why the softmax is performed wrt the TD error in this line, instead of wrt the Q value suggested in the paper?

Thanks!
opened by qizhg 1

Codes accompanying the paper "Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning" (NeurIPS 2021 Spotlight

Related tags

Overview

Implicit Constraint Q-Learning

Requirements

Quick Start

Citing

Note

Comments

ICQ-MA hangs indefinetly

关于ICA-antmaze_mu中ica.py文件中test_bc()函数

The calculation details of the value function in the advantage function

Tensor in different devices when run ICQ-MA

Unable to run with --config=qmix_smac

Questions on ICQ_softmax

Owner

[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Code for "The Intrinsic Dimension of Images and Its Impact on Learning" - ICLR 2021 Spotlight

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Code accompanying our paper Feature Learning in Infinite-Width Neural Networks

This is the accompanying toolbox for the paper "A Survey on GANs for Anomaly Detection"

Code accompanying the paper "Wasserstein GAN"

PyTorch code accompanying our paper on Maximum Entropy Generators for Energy-Based Models

Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers.

Code accompanying the paper "How Tight Can PAC-Bayes be in the Small Data Regime?"

Code repository accompanying the paper "On Adversarial Robustness: A Neural Architecture Search perspective"

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)