Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Future Power Networks

Last update: Jan 6, 2023

Related tags

Overview

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

This is the implementation of the paper Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks.

MAPDN is an environment of distributed/decentralised active voltage control on power distribution networks and a batch of state-of-the-art multi-agent actor-critic algorithms that can be used for training.

The environment implementation follows the multi-agent environment framework provided in PyMARL. Therefore, all baselines that are compatible with that framework can be easily applied to this environment.

Summary of the Repository

This repository includes the following components.

An environment of active voltage control (decentralised and distributed);
A training framework for MARL;
10 MARL algorithms;
- IAC, IDDPG, MADDPG, SQDDPG, IPPO, MAPPO, MAAC, MATD3, COMA, and FacMADDPG.
5 voltage barrier functions;
- Bowl, L1, L2, Courant Beltrami, and Bump.
Implementation of droop control and OPF in Matlab.

A Brief Introduction of the Task

In this section, we give a brief introduction of this task so that the users can easily understand the objective of this environment.

Objective: Each agent controls a PV inverter that generates the reactive power so that the voltage of each bus is varied and within the safety range defined as $0.95 \ p.u. \leq v_{k} \leq 1.05 \ p.u., \ \forall k \in V$, where $V$ is the set of buses of the whole system and $p.u.$ is a unit to measure voltage. Since each agent's decision could influence each other due to property of power networks and not all buses is installed a PV, agents should cooperate to control the voltage of all buses in a power network. Also, each agent can only observe the partial information as the observation. This problem is natually a Dec-POMDP.

Action: The reactive power is constrained by the capacity of the equipment, and the capacity is related to the active power of PV. As a result, the range of reactive power is dynamically varied. Mathematically, the reactive power of each PV inverter is represented as $$q_{k}^{\scriptscriptstyle PV} = a_{k} \ \sqrt{(s_{k}^{\scriptscriptstyle \max})^{2} - (p_{k}^{\scriptscriptstyle PV})^{2}},$$ where $s_{k}^{\scriptscriptstyle \max}$ is the maximum apparent power of the $k\text{th}$ node that is dependent on the physical capacity of the PV inverter; $p_{k}^{\scriptscriptstyle PV}$ is the instantaneous PV active power. The action we control is the variable $0 \leq a_{k} \leq 1$, indicating the percentage of the intantaneous capacity of reactive power. For this reason, the action is continuous in this task.

Observation: Each agent can observe the information of the zone where it belongs. For example, in Figure 1 the agent on bus 25 can observe the information in zone 3. Each agent's observation consists of the following variables within the zone:

Load Active Power,
Load Reactive Power,
PV Active Power,
PV Reactive Power,
Voltage.

Figure 1: Illustration on the 33-bus network. Each bus is indexed by a circle with a number. Four control regions are partitioned by the smallest path from the terminal to the main branch (bus 1-6). We control the voltage on bus 2-33 whereas bus 0-1 represent the substation with constant voltage and infinite active and reactive power capacity. G represents an external generator; small Ls represent loads; and emoji of sun represents the location where a PV is installed.

Reward: The reward function is shown as follows: $$\mathit{r} = - \frac{1}{|V|} \sum_{i \in V} l_{v}(v_{i}) - \alpha \cdot l_{q}(\mathbf{q}^{\scriptscriptstyle PV}),$$ where $l_{v}(\cdot)$ is a voltage barrier function that measures whether the voltage of a bus is within the safety range; $l_{q}(\mathbf{q}^{\scriptscriptstyle PV})=\frac{1}{|\mathcal{I}|}||\mathbf{q}^{\scriptscriptstyle PV}||_{1}$ that can be seen as a simple approximation of power loss, where $\mathbf{q}^{\scriptscriptstyle PV}$ is a vector of agents' reactive power, $\mathcal{I}$ is a set of agents and $\alpha$ is a multiplier to adjust the balance between voltage control and the generation of reactive power. In this work, we investigate different forms of $l_{v}(\cdot)$. Literally, the aim of this reward function is controlling the voltage, meanwhile minimising the power loss that is correlated with the economic loss.

Installation of the Dependencies

Install Anaconda.
After cloning or downloading this repository, assure that the current directory is [your own parent path]/MAPDN.
Execute the following command.
```
conda env create -f environment.yml
```
Activate the installed virtual environment using the following command.
```
conda activate mapdn
```

Downloading the Dataset

Download the data from the link.
Unzip the zip file and you can see the following 3 folders:
- case33_3min_final
- case141_3min_final
- case322_3min_final
Go to the directory [Your own parent path]/MAPDN/environments/var_voltage_control/ and create a folder called data.
Move the 3 extracted folders by step 2 to the directory [Your own parent path]/MAPDN/environments/var_voltage_control/data/.

Two modes of Tasks

Background

There are 2 modes of tasks included in this environment, i.e. distributed active voltage control and decentralised active voltage control. Distributed active voltage control is the task introduced in the paper Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks, whereas Decentralised active voltage control is the task that most of the prior works considered. The primary difference between these 2 modes of tasks are that in decentralised active voltage control the equipments in each zone are controlled by an agent, while in distributed active voltage control each equipment is controlled by an agent (see Figure 1).

How to use?

If you would attempt distributed active voltage control, you can set the argument for train.py and test.py as follows.

python train.py --mode distributed

python test.py --mode distributed

If you would attempt decentralised active voltage control, you can set the argument for train.py and test.py as follows.

python train.py --mode decentralised

python test.py --mode decentralised

Quick Start

Training Your Model

You can execute the following command to train a model on a power system using the following command.

python train.py --alg matd3 --alias 0 --mode distributed --scenario case33_3min_final --voltage-barrier-type l1 --save-path trial

The the meanings of the arguments are illustrated as follows:

--alg indicates the MARL algorithm you would like to use.
--alias is the alias to distinguish different experiments.
--mode is the mode of the envrionment. It contains 2 modes, e.g. distributed and decentralised. Distributed mode is the one introduced in this work, whereas decentralised mode is the traditional environment used by the prior works.
--scenario indicates the power system on which you would like to train.
--voltage-barrier-type indicates the voltage barrier function you would like to use for training.
--save-path is the path you would like to save the model, tensorboard and configures.

Testing Your Model

After training, you can exclusively test your model to do the further analysis using the following command.

python test.py --save-path trial/model_save --alg matd3 --alias 0 --scenario case33_3min_final --voltage-barrier-type l1 --test-mode single --test-day 730 --render

The the meanings of the arguments are illustrated as follows:

--alg indicates the MARL algorithm you used.
--alias is the alias you used to distinguish different experiments.
--mode is the mode of the envrionment you used to train your model.
--scenario indicates the power system on which you trained your model.
--voltage-barrier-type indicates the voltage barrier function you used for training.
--save-path is the path you saved your model. You just need to give the parent path including the directory model_save.
--test-mode is the test mode you would like to use. There are 2 modes you can use, i.e. single and batch.
--test-day is the day that you would like to do the test. Note that it is only activated if the --test-mode is single.
--render indicates activating the rendering of the environment.

Interaction with Environment

The simple use of the environment is shown as the following codes.

state, global_state = env.reset()

for t in range(240):
    actions = agents.get_actions() # a vector involving all agents' actions
    reward, done, info = env.step(actions)

Reproduce the Results in the Paper

Users can easily reproduce the results shown in the paper by running the bash script provided with the default configures provided in this repository, e.g.,

source train_case33.sh 0 l1 reproduction

source train_case33.sh 0 l2 reproduction

source train_case33.sh 0 bowl reproduction

source train_case141.sh 0 l1 reproduction

source train_case141.sh 0 l2 reproduction

source train_case141.sh 0 bowl reproduction

source train_case322.sh 0 l1 reproduction

source train_case322.sh 0 l2 reproduction

source train_case322.sh 0 bowl reproduction

The arguements of the above bash scripts are as follows.

$1: --alias
$2: --voltage-loss-type
$3: --save-path

Note: these training scripts are based on the assumption that you have at least 2 GPUs with 12 GB memory. If the above conditions do not satisfy your own local situation, please manually modify the allocation of GPUs. The results in the paper are produced by Geforce RTX 2080Ti.

Brief Introduction of Scenarios

We show the basic settings of all scenarios provided in this repository.

Scenario	No. Loads	No. Regions	No. PVs (Agents)	$p_{\text{max}}^{\scriptscriptstyle{L}}$	$p_{\text{max}}^{\scriptscriptstyle{PV}}$
Case33	32	4	6	3.5 MW	8.75 MW
Case141	84	9	22	20 MW	80 MW
Case322	337	22	38	1.5 MW	3.75 MW

Traditional Control

Downloading Date

Download the data from the link.
Extract the case files and move them to the directory [Your own parent path]/MAPDN/traditional_control.

Running

The traditional control methods are implemented by Matlab, empowered by MATPOWER. Please ensure that the latest version of MATPOWER is installed before the next execution.

Reproduce the results for droop control by running the file pf_droop_matpower_all.m.
Reproduce the results for OPF by running the file opf_matpower_all.m.

See the annotation in the files for more details.

API Usage

For more details of this environment, users can check the upcoming API Docs.

Citation

If you would use this environment or part of this work, please cite the following paper.

@misc{wang2021multiagent,
      title={Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks}, 
      author={Jianhong Wang and Wangkun Xu and Yunjie Gu and Wenbin Song and Tim C. Green},
      year={2021},
      eprint={2110.14300},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Contact

If you have any issues or any intention of cooperation, please feel free to contact me via [email protected].

Comments

Why is reactive power loss defined as “q = self.powergrid.res_sgen["q_mvar"].sort_index().to_numpy(copy=True)”
code in voltage_control_env.py

# reactive power (q) loss q = self.powergrid.res_sgen["q_mvar"].sort_index().to_numpy(copy=True) q_loss = np.mean(np.abs(q))

1.Why is reactive power loss defined as “q = self.powergrid.res_sgen["q_mvar"].sort_index().to_numpy(copy=True)”？ 2. "q_loss = np.mean(np.abs(q))" How do I understand this code?

Thank you very much for answering my question in your busy schedule.
opened by zly987 6
After I run the conda env create -f environment.yml in pycharm, I meet some problems:
First,i receive a warning. Warning : you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies... it prints indefinitely without another action.

Then i add pip under the dependencies, like following:

dependencies:

pip

_libgcc_mutex=0.1=main

argon2-cffi=20.1.0=py37h8f50634_2

I try again then I received: Solving environment: failed ResolvePackageNotFound:

what should I be doing instead
opened by Shenjwei 6
The question about Observation Probability Function

Dear author: Thank you very much for providing the code for the public. I have two question after reading your article and code. First, the definition of observation probability function is the probability of joint observations. I don't understand why you define the observation probability function as the measurement errors that may occur in sensors. Second, In the section 4.1 of your article you said "Specifically, Ω(ot+1|st+1, at) = st+1 + N (0, Σ), where N (0, Σ) is an isotropic multi-variable Gaussian distribution and Σ is dependent on the physical properties of sensors" However, I can‘t find the observation probability function in your code. I'll appreciate it if you could answer my question. Thank you so much!!! Best wishes

opened by Louis-Quan 5
Some problems of test

I'm sorry to bother you again I use the command python test.py --save-path trial322/model_save --alg maddpg --alias 0 --scenario case322_3min_final --voltage-barrier-type l1 --test-mode single --test-day 730 --render Then the program occur such error:

Traceback (most recent call last): File "test.py", line 108, in record = test.run(argv.test_day, 15, 1) File "F:\MAPDN\utilities\tester.py", line 43, in run self.env.render() File "F:\MAPDN\environments\var_voltage_control\voltage_control_env.py", line 648, in render self.init_render() File "F:\MAPDN\environments\var_voltage_control\voltage_control_env.py", line 642, in init_render from .rendering_voltage_control_env import Viewer File "F:\MAPDN\environments\var_voltage_control\rendering_voltage_control_env.py", line 6, in from gym import error File "D:\anaconda3\envs\mapdn\lib\site-packages\gym_init.py", line 13, in from gym.envs import make, spec, register File "D:\anaconda3\envs\mapdn\lib\site-packages\gym\envs_init.py", line 10, in load_env_plugins() File "D:\anaconda3\envs\mapdn\lib\site-packages\gym\envs\registration.py", line 276, in load_env_plugins fn = plugin.load() File "D:\anaconda3\envs\mapdn\lib\site-packages\importlib_metadata_init.py", line 194, in load module = import_module(match.group('module')) File "D:\anaconda3\envs\mapdn\lib\importlib_init_.py", line 127, in import_module return _bootstrap.gcd_import(name[level:], package, level) File "D:\anaconda3\envs\mapdn\lib\site-packages\ale_py\gym.py", line 5, in from ale_py.roms.utils import rom_name_to_id, rom_id_to_name File "D:\anaconda3\envs\mapdn\lib\site-packages\ale_py\roms_init.py", line 94, in _RESOLVED_ROMS = resolve_roms() File "D:\anaconda3\envs\mapdn\lib\site-packages\ale_py\roms_init.py", line 46, in resolve_roms supported, unsupported = package.resolve() File "D:\anaconda3\envs\mapdn\lib\site-packages\ale_py\roms\utils.py", line 60, in resolve lambda file: file.suffix == ".bin", resources.files(self.package).iterdir() File "D:\anaconda3\envs\mapdn\lib\site-packages\importlib_resources_common.py", line 22, in files return from_package(get_package(package)) File "D:\anaconda3\envs\mapdn\lib\site-packages\importlib_resources_common.py", line 53, in get_package resolved = resolve(package) File "D:\anaconda3\envs\mapdn\lib\site-packages\importlib_resources_common.py", line 44, in resolve return cand if isinstance(cand, types.ModuleType) else importlib.import_module(cand) File "D:\anaconda3\envs\mapdn\lib\importlib_init.py", line 127, in import_module return bootstrap.gcd_import(name[level:], package, level) File "D:\anaconda3\envs\mapdn\lib\site-packages\atari_py_init.py", line 1, in from .ale_python_interface import * File "D:\anaconda3\envs\mapdn\lib\site-packages\atari_py\ale_python_interface.py", line 18, in 'ale_interface/ale_c.dll')) File "D:\anaconda3\envs\mapdn\lib\ctypes_init.py", line 442, in LoadLibrary return self.dlltype(name) File "D:\anaconda3\envs\mapdn\lib\ctypes_init.py", line 364, in init self._handle = _dlopen(self._name, mode) OSError: [WinError 126]

What should i do next,how should I use this module --rendering_voltage_control_env.py?

opened by Shenjwei 5
Question about random seeds

In your article ”During training, we randomly sample the initial state for an episode and each episode lasts for 240 time steps (i.e. a half day). Every experiment is run with 5 random seeds and the test results during training are given by the median and the 25%-75% quartile shading. Each test is conducted every 20 episodes with 10 random selected episodes for evaluation.“

''Every experiment is run with 5 random seeds'' ,but I didn't find it in the code. Can you tell me how to add random seeds? I have tried but it doesn't work.

opened by zly987 4
how can your environment be easily and flexibly extended with more network topologies and data

Dear authors, according to your paper "Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks " . you said that the primary limitation of OPF is the need of exact system model , and your environment can be easily and flexibly extended with more network topologies and data,

In the training stage, the agents should be trained with pandapower based on precise network cases ,.How can the training results be extented with other topologies ?

opened by zzqiang163 3
q_weight= 0.1

Set q_weight= 0.1 ，The purpose is the same order of magnitude as the voltage deviation value？ Can it be understood that the weight of voltage deviation and power loss is 0.5 respectively

opened by zly987 2
Questions to ask about the output of test.py

if argv.test_mode == 'single': # record = test.run(199, 23, 2) # (day, hour, 3min) # record = test.run(730, 23, 2) # (day, hour, 3min) record = test.run(argv.test_day, 23, 2) with open('test_record_'+log_name+f'_day{argv.test_day}'+'.pickle', 'wb') as f: pickle.dump(record, f, pickle.HIGHEST_PROTOCOL)

I don't understand this code “record = test.run(argv.test_day, 23, 2)”，if test_day=730, When does PV data start,Can you give me a specific moment to make it easy for me to understand.And Why is the third parameter 2 instead of 3?If I want to test with the data of the last day, that is, the data of day 1095, is the first parameter 1094?

opened by zly987 2
Request: the source file of the model.p file in the var_voltage_control file/case33_3min_final

Thank you very much for open source multi-agent environment.But I have one reques,I hope it's not too much to ask. I have converted the model. p file to an .npy file,I've learned some information, but I want to learn more about the source code file of the model. p file.

opened by zly987 2
another problem Occurs during installation

sorry to disturb you again. I encountered some problems when installing those packages. At first, it ran very smoothly, but another problem interrupted the installation.

Pip subprocess error: ERROR: Could not install packages due to an OSError: [WinError 5] : 'd:\anaconda3\envs\mapdn\scripts\pygmentize.exe' Consider using the --user option or check the permissions.

failed

CondaEnvException: Pip failed

So how to how to adjust the program

opened by Shenjwei 2
is communication helpful in the MARL

Dear author, I am wondering is it helpful if the agent incorporates other agents' information (e.g., state, policy) during testing, or it is not necessary for DER to communicate with each other when they are connected to the main grid. Thank you for your time and insights!

opened by DongChen06 1
For: Issue for "AttributeError: 'SQDDPG' object has no attribute 'agent_importance_vec'"

Hi, I am try reproducing results and have following issue in "sqddpg_33_0_l1.out" while doing "source train_case33.sh 0 l1 reproduction" Traceback (most recent call last): File "train.py", line 115, in train.run(stat, i) File "/home/ben/ben/MAPDN/utilities/trainer.py", line 111, in run self.behaviour_net.train_process(stat, self) File "/home/ben/ben/MAPDN/models/model.py", line 213, in train_process value = self.value(state_, action_pol) File "/home/ben/ben/MAPDN/models/sqddpg.py", line 108, in value return self.marginal_contribution(obs, act) File "/home/ben/ben/MAPDN/models/sqddpg.py", line 66, in marginal_contribution subcoalition_map, grand_coalitions, individual_map = self.sample_grandcoalitions(batch_size) # shape = (b, n_s, n, n) File "/home/ben/ben/MAPDN/models/sqddpg.py", line 38, in sample_grandcoalitions agent_importance_vec = self.agent_importance_vec.unsqueeze(0).expand(batch_size*self.sample_size, self.n_) File "/home/ben/anaconda3/envs/mapdn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1266, in getattr type(self).name, name)) AttributeError: 'SQDDPG' object has no attribute 'agent_importance_vec'

May you help for this?

thanks

opened by yqlwqj 1
issue in setting up test file of the code

i am trying to replicate the results you have done in this code. i was struggling with trainng set first. i somehow managed to run the train.py file. after training, i am trying to run the test.py file and i am getting this error. i am working on replicating your paper's result and doing this comparison you have done for one of my project in my course. any help will be appreciated

opened by AizazHaider51214 1
Question about the 322-bus network

Dear authors, thanks for the wonderful paper and repo. I have a question about how to generate the 322-bus network with Simbench. I read the Simbench paper and found the mentioned networks below. I did not find the corresponding 322-bus network as well as on the Simbench website. Thank you for your time and insights!

opened by DongChen06 4

Releases(v1.0.4)

v1.0.4(Jan 30, 2022)

fix errors on specifying the voltage barrier type: https://github.com/Future-Power-Networks/MAPDN/commit/556231c870c543076dcdc764616d9e596986230a

fix the error on capture demand: https://github.com/Future-Power-Networks/MAPDN/commit/81aa2a393d789ca32dff6e6a718e9ba03f289e0a
Source code(tar.gz)
Source code(zip)
v1.0.3(Nov 15, 2021)

Add env setup for win
Source code(tar.gz)
Source code(zip)
v1.0.2(Nov 12, 2021)

some improvements on readme.
Source code(tar.gz)
Source code(zip)
v1.0.1(Nov 1, 2021)

do some refinements and fix some bugs.
Source code(tar.gz)
Source code(zip)

Owner

Future Power Networks

An inter-university research union for theories and techonologies to support renewable power networks.

GitHub

This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

1.4k Dec 22, 2022

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

297 Dec 12, 2022

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

405 Jan 6, 2023

A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

348 Jan 8, 2023

A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

463 Dec 23, 2022

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

183 Dec 28, 2022

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

334 Jan 6, 2023

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

96 Dec 22, 2022

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Related tags

Overview

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Summary of the Repository

A Brief Introduction of the Task

Installation of the Dependencies

Downloading the Dataset

Two modes of Tasks

Background

How to use?

Quick Start

Training Your Model

Testing Your Model

Interaction with Environment

Reproduce the Results in the Paper

Brief Introduction of Scenarios

Traditional Control

Downloading Date

Running

API Usage

Citation

Contact

Comments

Releases(v1.0.4)

v1.0.4(Jan 30, 2022)

v1.0.3(Nov 15, 2021)

v1.0.2(Nov 12, 2021)

v1.0.1(Nov 1, 2021)

Owner

Future Power Networks

This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

A parallel framework for population-based multi-agent reinforcement learning.

A library of multi-agent reinforcement learning components and systems

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Multi-agent reinforcement learning algorithm and environment

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks

A multi-entity Transformer for multi-agent spatiotemporal modeling.

Multi-task Multi-agent Soft Actor Critic for SMAC

Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

Deep Reinforcement Learning based Trading Agent for Bitcoin

Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Minecraft agent to farm resources using reinforcement learning