Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Overview

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

This is the implementation of the paper Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks.

MAPDN is an environment of distributed/decentralised active voltage control on power distribution networks and a batch of state-of-the-art multi-agent actor-critic algorithms that can be used for training.

The environment implementation follows the multi-agent environment framework provided in PyMARL. Therefore, all baselines that are compatible with that framework can be easily applied to this environment.


Summary of the Repository

This repository includes the following components.

  • An environment of active voltage control (decentralised and distributed);

  • A training framework for MARL;

  • 10 MARL algorithms;

  • 5 voltage barrier functions;

    • Bowl, L1, L2, Courant Beltrami, and Bump.
  • Implementation of droop control and OPF in Matlab.


A Brief Introduction of the Task

In this section, we give a brief introduction of this task so that the users can easily understand the objective of this environment.

Objective: Each agent controls a PV inverter that generates the reactive power so that the voltage of each bus is varied and within the safety range defined as $0.95 \ p.u. \leq v_{k} \leq 1.05 \ p.u., \ \forall k \in V$, where $V$ is the set of buses of the whole system and $p.u.$ is a unit to measure voltage. Since each agent's decision could influence each other due to property of power networks and not all buses is installed a PV, agents should cooperate to control the voltage of all buses in a power network. Also, each agent can only observe the partial information as the observation. This problem is natually a Dec-POMDP.

Action: The reactive power is constrained by the capacity of the equipment, and the capacity is related to the active power of PV. As a result, the range of reactive power is dynamically varied. Mathematically, the reactive power of each PV inverter is represented as $$q_{k}^{\scriptscriptstyle PV} = a_{k} \ \sqrt{(s_{k}^{\scriptscriptstyle \max})^{2} - (p_{k}^{\scriptscriptstyle PV})^{2}},$$ where $s_{k}^{\scriptscriptstyle \max}$ is the maximum apparent power of the $k\text{th}$ node that is dependent on the physical capacity of the PV inverter; $p_{k}^{\scriptscriptstyle PV}$ is the instantaneous PV active power. The action we control is the variable $0 \leq a_{k} \leq 1$, indicating the percentage of the intantaneous capacity of reactive power. For this reason, the action is continuous in this task.

Observation: Each agent can observe the information of the zone where it belongs. For example, in Figure 1 the agent on bus 25 can observe the information in zone 3. Each agent's observation consists of the following variables within the zone:

  • Load Active Power,
  • Load Reactive Power,
  • PV Active Power,
  • PV Reactive Power,
  • Voltage.

Figure 1: Illustration on the 33-bus network. Each bus is indexed by a circle with a number. Four control regions are partitioned by the smallest path from the terminal to the main branch (bus 1-6). We control the voltage on bus 2-33 whereas bus 0-1 represent the substation with constant voltage and infinite active and reactive power capacity. G represents an external generator; small Ls represent loads; and emoji of sun represents the location where a PV is installed.

Reward: The reward function is shown as follows: $$\mathit{r} = - \frac{1}{|V|} \sum_{i \in V} l_{v}(v_{i}) - \alpha \cdot l_{q}(\mathbf{q}^{\scriptscriptstyle PV}),$$ where $l_{v}(\cdot)$ is a voltage barrier function that measures whether the voltage of a bus is within the safety range; $l_{q}(\mathbf{q}^{\scriptscriptstyle PV})=\frac{1}{|\mathcal{I}|}||\mathbf{q}^{\scriptscriptstyle PV}||_{1}$ that can be seen as a simple approximation of power loss, where $\mathbf{q}^{\scriptscriptstyle PV}$ is a vector of agents' reactive power, $\mathcal{I}$ is a set of agents and $\alpha$ is a multiplier to adjust the balance between voltage control and the generation of reactive power. In this work, we investigate different forms of $l_{v}(\cdot)$. Literally, the aim of this reward function is controlling the voltage, meanwhile minimising the power loss that is correlated with the economic loss.


Installation of the Dependencies

  1. Install Anaconda.
  2. After cloning or downloading this repository, assure that the current directory is [your own parent path]/MAPDN.
  3. Execute the following command.
    conda env create -f environment.yml
  4. Activate the installed virtual environment using the following command.
    conda activate mapdn

Downloading the Dataset

  1. Download the data from the link.

  2. Unzip the zip file and you can see the following 3 folders:

    • case33_3min_final
    • case141_3min_final
    • case322_3min_final
  3. Go to the directory [Your own parent path]/MAPDN/environments/var_voltage_control/ and create a folder called data.

  4. Move the 3 extracted folders by step 2 to the directory [Your own parent path]/MAPDN/environments/var_voltage_control/data/.


Two modes of Tasks

Background

There are 2 modes of tasks included in this environment, i.e. distributed active voltage control and decentralised active voltage control. Distributed active voltage control is the task introduced in the paper Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks, whereas Decentralised active voltage control is the task that most of the prior works considered. The primary difference between these 2 modes of tasks are that in decentralised active voltage control the equipments in each zone are controlled by an agent, while in distributed active voltage control each equipment is controlled by an agent (see Figure 1).

How to use?

If you would attempt distributed active voltage control, you can set the argument for train.py and test.py as follows.

python train.py --mode distributed
python test.py --mode distributed

If you would attempt decentralised active voltage control, you can set the argument for train.py and test.py as follows.

python train.py --mode decentralised
python test.py --mode decentralised

Quick Start

Training Your Model

You can execute the following command to train a model on a power system using the following command.

python train.py --alg matd3 --alias 0 --mode distributed --scenario case33_3min_final --voltage-barrier-type l1 --save-path trial

The the meanings of the arguments are illustrated as follows:

  • --alg indicates the MARL algorithm you would like to use.
  • --alias is the alias to distinguish different experiments.
  • --mode is the mode of the envrionment. It contains 2 modes, e.g. distributed and decentralised. Distributed mode is the one introduced in this work, whereas decentralised mode is the traditional environment used by the prior works.
  • --scenario indicates the power system on which you would like to train.
  • --voltage-barrier-type indicates the voltage barrier function you would like to use for training.
  • --save-path is the path you would like to save the model, tensorboard and configures.

Testing Your Model

After training, you can exclusively test your model to do the further analysis using the following command.

python test.py --save-path trial/model_save --alg matd3 --alias 0 --scenario case33_3min_final --voltage-barrier-type l1 --test-mode single --test-day 730 --render

The the meanings of the arguments are illustrated as follows:

  • --alg indicates the MARL algorithm you used.
  • --alias is the alias you used to distinguish different experiments.
  • --mode is the mode of the envrionment you used to train your model.
  • --scenario indicates the power system on which you trained your model.
  • --voltage-barrier-type indicates the voltage barrier function you used for training.
  • --save-path is the path you saved your model. You just need to give the parent path including the directory model_save.
  • --test-mode is the test mode you would like to use. There are 2 modes you can use, i.e. single and batch.
  • --test-day is the day that you would like to do the test. Note that it is only activated if the --test-mode is single.
  • --render indicates activating the rendering of the environment.

Interaction with Environment

The simple use of the environment is shown as the following codes.

state, global_state = env.reset()

for t in range(240):
    actions = agents.get_actions() # a vector involving all agents' actions
    reward, done, info = env.step(actions)

Reproduce the Results in the Paper

Users can easily reproduce the results shown in the paper by running the bash script provided with the default configures provided in this repository, e.g.,

source train_case33.sh 0 l1 reproduction
source train_case33.sh 0 l2 reproduction
source train_case33.sh 0 bowl reproduction
source train_case141.sh 0 l1 reproduction
source train_case141.sh 0 l2 reproduction
source train_case141.sh 0 bowl reproduction
source train_case322.sh 0 l1 reproduction
source train_case322.sh 0 l2 reproduction
source train_case322.sh 0 bowl reproduction

The arguements of the above bash scripts are as follows.

$1: --alias
$2: --voltage-loss-type
$3: --save-path

Note: these training scripts are based on the assumption that you have at least 2 GPUs with 12 GB memory. If the above conditions do not satisfy your own local situation, please manually modify the allocation of GPUs. The results in the paper are produced by Geforce RTX 2080Ti.


Brief Introduction of Scenarios

We show the basic settings of all scenarios provided in this repository.

Scenario No. Loads No. Regions No. PVs (Agents) $p_{\text{max}}^{\scriptscriptstyle{L}}$ $p_{\text{max}}^{\scriptscriptstyle{PV}}$
Case33 32 4 6 3.5 MW 8.75 MW
Case141 84 9 22 20 MW 80 MW
Case322 337 22 38 1.5 MW 3.75 MW

Traditional Control

Downloading Date

  1. Download the data from the link.
  2. Extract the case files and move them to the directory [Your own parent path]/MAPDN/traditional_control.

Running

The traditional control methods are implemented by Matlab, empowered by MATPOWER. Please ensure that the latest version of MATPOWER is installed before the next execution.

  • Reproduce the results for droop control by running the file pf_droop_matpower_all.m.

  • Reproduce the results for OPF by running the file opf_matpower_all.m.

See the annotation in the files for more details.


API Usage

For more details of this environment, users can check the upcoming API Docs.


Citation

If you would use this environment or part of this work, please cite the following paper.

@misc{wang2021multiagent,
      title={Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks}, 
      author={Jianhong Wang and Wangkun Xu and Yunjie Gu and Wenbin Song and Tim C. Green},
      year={2021},
      eprint={2110.14300},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Contact

If you have any issues or any intention of cooperation, please feel free to contact me via [email protected].

Comments
  • Why is reactive power loss defined as “q = self.powergrid.res_sgen[

    Why is reactive power loss defined as “q = self.powergrid.res_sgen["q_mvar"].sort_index().to_numpy(copy=True)”

    code in voltage_control_env.py

        # reactive power (q) loss 
        q = self.powergrid.res_sgen["q_mvar"].sort_index().to_numpy(copy=True)
        q_loss = np.mean(np.abs(q))
    

    1.Why is reactive power loss defined as “q = self.powergrid.res_sgen["q_mvar"].sort_index().to_numpy(copy=True)”? 2. "q_loss = np.mean(np.abs(q))" How do I understand this code?

    Thank you very much for answering my question in your busy schedule.

    opened by zly987 6
  • After I run the conda env create -f environment.yml in pycharm, I meet some problems:

    After I run the conda env create -f environment.yml in pycharm, I meet some problems:

    First,i receive a warning. Warning : you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies... it prints indefinitely without another action.

    Then i add pip under the dependencies, like following:

    dependencies:

    • pip
    • _libgcc_mutex=0.1=main
    • argon2-cffi=20.1.0=py37h8f50634_2

    I try again then I received: Solving environment: failed ResolvePackageNotFound:

    what should I be doing instead

    opened by Shenjwei 6
  •  The question about Observation Probability Function

    The question about Observation Probability Function

    Dear author: Thank you very much for providing the code for the public. I have two question after reading your article and code. First, the definition of observation probability function is the probability of joint observations. I don't understand why you define the observation probability function as the measurement errors that may occur in sensors. Second, In the section 4.1 of your article you said "Specifically, Ω(ot+1|st+1, at) = st+1 + N (0, Σ), where N (0, Σ) is an isotropic multi-variable Gaussian distribution and Σ is dependent on the physical properties of sensors" However, I can‘t find the observation probability function in your code. I'll appreciate it if you could answer my question. Thank you so much!!! Best wishes

    opened by Louis-Quan 5
  • Some problems of test

    Some problems of test

    I'm sorry to bother you again I use the command python test.py --save-path trial322/model_save --alg maddpg --alias 0 --scenario case322_3min_final --voltage-barrier-type l1 --test-mode single --test-day 730 --render Then the program occur such error:

    Traceback (most recent call last): File "test.py", line 108, in record = test.run(argv.test_day, 15, 1) File "F:\MAPDN\utilities\tester.py", line 43, in run self.env.render() File "F:\MAPDN\environments\var_voltage_control\voltage_control_env.py", line 648, in render self.init_render() File "F:\MAPDN\environments\var_voltage_control\voltage_control_env.py", line 642, in init_render from .rendering_voltage_control_env import Viewer File "F:\MAPDN\environments\var_voltage_control\rendering_voltage_control_env.py", line 6, in from gym import error File "D:\anaconda3\envs\mapdn\lib\site-packages\gym_init.py", line 13, in from gym.envs import make, spec, register File "D:\anaconda3\envs\mapdn\lib\site-packages\gym\envs_init.py", line 10, in load_env_plugins() File "D:\anaconda3\envs\mapdn\lib\site-packages\gym\envs\registration.py", line 276, in load_env_plugins fn = plugin.load() File "D:\anaconda3\envs\mapdn\lib\site-packages\importlib_metadata_init.py", line 194, in load module = import_module(match.group('module')) File "D:\anaconda3\envs\mapdn\lib\importlib_init_.py", line 127, in import_module return _bootstrap.gcd_import(name[level:], package, level) File "D:\anaconda3\envs\mapdn\lib\site-packages\ale_py\gym.py", line 5, in from ale_py.roms.utils import rom_name_to_id, rom_id_to_name File "D:\anaconda3\envs\mapdn\lib\site-packages\ale_py\roms_init.py", line 94, in _RESOLVED_ROMS = resolve_roms() File "D:\anaconda3\envs\mapdn\lib\site-packages\ale_py\roms_init.py", line 46, in resolve_roms supported, unsupported = package.resolve() File "D:\anaconda3\envs\mapdn\lib\site-packages\ale_py\roms\utils.py", line 60, in resolve lambda file: file.suffix == ".bin", resources.files(self.package).iterdir() File "D:\anaconda3\envs\mapdn\lib\site-packages\importlib_resources_common.py", line 22, in files return from_package(get_package(package)) File "D:\anaconda3\envs\mapdn\lib\site-packages\importlib_resources_common.py", line 53, in get_package resolved = resolve(package) File "D:\anaconda3\envs\mapdn\lib\site-packages\importlib_resources_common.py", line 44, in resolve return cand if isinstance(cand, types.ModuleType) else importlib.import_module(cand) File "D:\anaconda3\envs\mapdn\lib\importlib_init.py", line 127, in import_module return bootstrap.gcd_import(name[level:], package, level) File "D:\anaconda3\envs\mapdn\lib\site-packages\atari_py_init.py", line 1, in from .ale_python_interface import * File "D:\anaconda3\envs\mapdn\lib\site-packages\atari_py\ale_python_interface.py", line 18, in 'ale_interface/ale_c.dll')) File "D:\anaconda3\envs\mapdn\lib\ctypes_init.py", line 442, in LoadLibrary return self.dlltype(name) File "D:\anaconda3\envs\mapdn\lib\ctypes_init.py", line 364, in init self._handle = _dlopen(self._name, mode) OSError: [WinError 126]

    What should i do next,how should I use this module --rendering_voltage_control_env.py?

    opened by Shenjwei 5
  • Question about random seeds

    Question about random seeds

    In your article ”During training, we randomly sample the initial state for an episode and each episode lasts for 240 time steps (i.e. a half day). Every experiment is run with 5 random seeds and the test results during training are given by the median and the 25%-75% quartile shading. Each test is conducted every 20 episodes with 10 random selected episodes for evaluation.“

    ''Every experiment is run with 5 random seeds'' ,but I didn't find it in the code. Can you tell me how to add random seeds? I have tried but it doesn't work.

    opened by zly987 4
  • how can your  environment  be easily and flexibly extended with more network topologies and data

    how can your environment be easily and flexibly extended with more network topologies and data

    Dear authors, according to your paper "Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks " . you said that the primary limitation of OPF is the need of exact system model , and your environment can be easily and flexibly extended with more network topologies and data,

    In the training stage, the agents should be trained with pandapower based on precise network cases ,.How can the training results be extented with other topologies ?

    opened by zzqiang163 3
  •  q_weight= 0.1

    q_weight= 0.1

    Set q_weight= 0.1 ,The purpose is the same order of magnitude as the voltage deviation value? Can it be understood that the weight of voltage deviation and power loss is 0.5 respectively

    opened by zly987 2
  • Questions to ask about the output of test.py

    Questions to ask about the output of test.py

    if argv.test_mode == 'single': # record = test.run(199, 23, 2) # (day, hour, 3min) # record = test.run(730, 23, 2) # (day, hour, 3min) record = test.run(argv.test_day, 23, 2) with open('test_record_'+log_name+f'_day{argv.test_day}'+'.pickle', 'wb') as f: pickle.dump(record, f, pickle.HIGHEST_PROTOCOL)

    I don't understand this code “record = test.run(argv.test_day, 23, 2)”,if test_day=730, When does PV data start,Can you give me a specific moment to make it easy for me to understand.And Why is the third parameter 2 instead of 3?If I want to test with the data of the last day, that is, the data of day 1095, is the first parameter 1094?

    opened by zly987 2
  • Request: the source file of the model.p file in the var_voltage_control file/case33_3min_final

    Request: the source file of the model.p file in the var_voltage_control file/case33_3min_final

    Thank you very much for open source multi-agent environment.But I have one reques,I hope it's not too much to ask. I have converted the model. p file to an .npy file,I've learned some information, but I want to learn more about the source code file of the model. p file.

    opened by zly987 2
  • another problem Occurs during installation

    another problem Occurs during installation

    sorry to disturb you again. I encountered some problems when installing those packages. At first, it ran very smoothly, but another problem interrupted the installation.

    Pip subprocess error: ERROR: Could not install packages due to an OSError: [WinError 5] : 'd:\anaconda3\envs\mapdn\scripts\pygmentize.exe' Consider using the --user option or check the permissions.

    failed

    CondaEnvException: Pip failed

    So how to how to adjust the program

    opened by Shenjwei 2
  • is communication helpful in the MARL

    is communication helpful in the MARL

    Dear author, I am wondering is it helpful if the agent incorporates other agents' information (e.g., state, policy) during testing, or it is not necessary for DER to communicate with each other when they are connected to the main grid. Thank you for your time and insights!

    opened by DongChen06 1
  • For: Issue for

    For: Issue for "AttributeError: 'SQDDPG' object has no attribute 'agent_importance_vec'"

    Hi, I am try reproducing results and have following issue in "sqddpg_33_0_l1.out" while doing "source train_case33.sh 0 l1 reproduction" Traceback (most recent call last): File "train.py", line 115, in train.run(stat, i) File "/home/ben/ben/MAPDN/utilities/trainer.py", line 111, in run self.behaviour_net.train_process(stat, self) File "/home/ben/ben/MAPDN/models/model.py", line 213, in train_process value = self.value(state_, action_pol) File "/home/ben/ben/MAPDN/models/sqddpg.py", line 108, in value return self.marginal_contribution(obs, act) File "/home/ben/ben/MAPDN/models/sqddpg.py", line 66, in marginal_contribution subcoalition_map, grand_coalitions, individual_map = self.sample_grandcoalitions(batch_size) # shape = (b, n_s, n, n) File "/home/ben/ben/MAPDN/models/sqddpg.py", line 38, in sample_grandcoalitions agent_importance_vec = self.agent_importance_vec.unsqueeze(0).expand(batch_size*self.sample_size, self.n_) File "/home/ben/anaconda3/envs/mapdn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1266, in getattr type(self).name, name)) AttributeError: 'SQDDPG' object has no attribute 'agent_importance_vec'

    May you help for this?

    thanks

    opened by yqlwqj 1
  • issue in setting up test file of the code

    issue in setting up test file of the code

    i am trying to replicate the results you have done in this code. i was struggling with trainng set first. i somehow managed to run the train.py file. after training, i am trying to run the test.py file and i am getting this error. error i am working on replicating your paper's result and doing this comparison you have done for one of my project in my course. any help will be appreciated

    opened by AizazHaider51214 1
  • Question about the 322-bus network

    Question about the 322-bus network

    Dear authors, thanks for the wonderful paper and repo. I have a question about how to generate the 322-bus network with Simbench. I read the Simbench paper and found the mentioned networks below. I did not find the corresponding 322-bus network as well as on the Simbench website. Thank you for your time and insights! image

    opened by DongChen06 4
Releases(v1.0.4)
Owner
Future Power Networks
An inter-university research union for theories and techonologies to support renewable power networks.
Future Power Networks
This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

Deepender Singla 1.4k Dec 22, 2022
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

null 405 Jan 6, 2023
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 8, 2023
A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

InstaDeep Ltd 463 Dec 23, 2022
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 6, 2023
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

万鲲鹏 7 Sep 20, 2022
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022
[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

MIT Graphics Group 65 Jan 7, 2023
Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks

Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks This repository contains the code and data for the corresp

Friederike Metz 7 Apr 23, 2022
A multi-entity Transformer for multi-agent spatiotemporal modeling.

baller2vec This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotempor

Michael A. Alcorn 56 Nov 15, 2022
Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

RuanJingqing 8 Sep 30, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 2, 2023
Deep Reinforcement Learning based Trading Agent for Bitcoin

Deep Trading Agent Deep Reinforcement Learning based Trading Agent for Bitcoin using DeepSense Network for Q function approximation. For complete deta

Kartikay Garg 669 Dec 29, 2022
Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Deep Reinforcement Learning for Smart Cities Documentation RLlib: https://docs.ray.io/en/master/rllib.html Mesa: https://mesa.readthedocs.io/en/stable

null 1 May 15, 2022
Minecraft agent to farm resources using reinforcement learning

BarnyardBot CS 175 group project using Malmo download BarnyardBot.py into the python examples directory and run 'python BarnyardBot.py' in the console

null 0 Jul 26, 2022