PyTorch implementations of deep reinforcement learning algorithms and environments

Petros Christodoulou

Last update: Jan 4, 2023

Related tags

Deep Learning Deep-Reinforcement-Learning-Algorithms-with-PyTorch

Overview

Deep Reinforcement Learning Algorithms with PyTorch

This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments.

(To help you remember things you learn about machine learning in general write them in Save All and try out the public deck there about Fast AI's machine learning textbook.)

Algorithms Implemented

Deep Q Learning (DQN) _{^{(Mnih et al. 2013)}}
DQN with Fixed Q Targets _{^{(Mnih et al. 2013)}}
Double DQN (DDQN) _{^{(Hado van Hasselt et al. 2015)}}
DDQN with Prioritised Experience Replay _{^{(Schaul et al. 2016)}}
Dueling DDQN _{^{(Wang et al. 2016)}}
REINFORCE _{^{(Williams et al. 1992)}}
Deep Deterministic Policy Gradients (DDPG) _{^{(Lillicrap et al. 2016 )}}
Twin Delayed Deep Deterministic Policy Gradients (TD3) _{^{(Fujimoto et al. 2018)}}
Soft Actor-Critic (SAC) _{^{(Haarnoja et al. 2018)}}
Soft Actor-Critic for Discrete Actions (SAC-Discrete) _{^{(Christodoulou 2019)}}
Asynchronous Advantage Actor Critic (A3C) _{^{(Mnih et al. 2016)}}
Syncrhonous Advantage Actor Critic (A2C)
Proximal Policy Optimisation (PPO) _{^{(Schulman et al. 2017)}}
DQN with Hindsight Experience Replay (DQN-HER) _{^{(Andrychowicz et al. 2018)}}
DDPG with Hindsight Experience Replay (DDPG-HER) _{^{(Andrychowicz et al. 2018 )}}
Hierarchical-DQN (h-DQN) _{^{(Kulkarni et al. 2016)}}
Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) _{^{(Florensa et al. 2017)}}
Diversity Is All You Need (DIAYN) _{^{(Eyensbach et al. 2018)}}

All implementations are able to quickly solve Cart Pole (discrete actions), Mountain Car Continuous (continuous actions), Bit Flipping (discrete actions with dynamic goals) or Fetch Reach (continuous actions with dynamic goals). I plan to add more hierarchical RL algorithms soon.

Environments Implemented

Bit Flipping Game _{^{(as described in Andrychowicz et al. 2018)}}
Four Rooms Game _{^{(as described in Sutton et al. 1998)}}
Long Corridor Game _{^{(as described in Kulkarni et al. 2016)}}
Ant-{Maze, Push, Fall} _{^{(as desribed in Nachum et al. 2018 and their accompanying code)}}

Results

1. Cart Pole and Mountain Car

Below shows various RL algorithms successfully learning discrete action game Cart Pole or continuous action game Mountain Car. The mean result from running the algorithms with 3 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters used can be found in files results/Cart_Pole.py and results/Mountain_Car.py.

2. Hindsight Experience Replay (HER) Experiements

Below shows the performance of DQN and DDPG with and without Hindsight Experience Replay (HER) in the Bit Flipping (14 bits) and Fetch Reach environments described in the papers Hindsight Experience Replay 2018 and Multi-Goal Reinforcement Learning 2018. The results replicate the results found in the papers and show how adding HER can allow an agent to solve problems that it otherwise would not be able to solve at all. Note that the same hyperparameters were used within each pair of agents and so the only difference between them was whether hindsight was used or not.

3. Hierarchical Reinforcement Learning Experiments

The results on the left below show the performance of DQN and the algorithm hierarchical-DQN from Kulkarni et al. 2016 on the Long Corridor environment also explained in Kulkarni et al. 2016. The environment requires the agent to go to the end of a corridor before coming back in order to receive a larger reward. This delayed gratification and the aliasing of states makes it a somewhat impossible game for DQN to learn but if we introduce a meta-controller (as in h-DQN) which directs a lower-level controller how to behave we are able to make more progress. This aligns with the results found in the paper.

The results on the right show the performance of DDQN and algorithm Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) from Florensa et al. 2017. DDQN is used as the comparison because the implementation of SSN-HRL uses 2 DDQN algorithms within it. Note that the first 300 episodes of training for SNN-HRL were used for pre-training which is why there is no reward for those episodes.

Usage

The repository's high-level structure is:

├── agents                    
    ├── actor_critic_agents   
    ├── DQN_agents         
    ├── policy_gradient_agents
    └── stochastic_policy_search_agents 
├── environments   
├── results             
    └── data_and_graphs        
├── tests
├── utilities             
    └── data structures

i) To watch the agents learn the above games

To watch all the different agents learn Cart Pole follow these steps:

git clone https://github.com/p-christ/Deep_RL_Implementations.git
cd Deep_RL_Implementations

conda create --name myenvname
y
conda activate myenvname

pip3 install -r requirements.txt

python results/Cart_Pole.py

For other games change the last line to one of the other files in the Results folder.

ii) To train the agents on another game

Most Open AI gym environments should work. All you would need to do is change the config.environment field (look at Results/Cart_Pole.py for an example of this).

You can also play with your own custom game if you create a separate class that inherits from gym.Env. See Environments/Four_Rooms_Environment.py for an example of a custom environment and then see the script Results/Four_Rooms.py to see how to have agents play the environment.

Comments

Mean of expectation in SAC_discrete.py possibly wrong?
Hi Petros,

in your SAC_discrete code you are using the following in SAC_Discrete.py:

min_qf_next_target = action_probabilities * (torch.min(qf1_next_target, qf2_next_target) - self.alpha * log_action_probabilities) min_qf_next_target = min_qf_next_target.mean(dim=1).unsqueeze(-1)

So it looks like you're effectively taking the mean of the probability-weighted Q-values in the last line? However, the weights are already provided by action_probabilities, so instead of doing .mean(dim=1) it should be .sum(dim=1) in order to give you the proper expectation of the next q value for each item in the batch. Or what am I missing here?

Running this on LunarLander I also noted that the automatic entropy adjustment doesn't seem to work well. The original SAC paper mentions there is no "upper bound" in the calculation since the value should be tightly coupled to the constrained optimization problem. However, in your implementation the self.alpha value can go very high! I would have expected this value to be close / lower than the initial target_entropy level, which in the case of LunarLander is -log(1/4)*0.98 = 1.35, but I've seen it go to over 20! Do you have an explanation for that by chance?

Thanks for your work, it's very appreciated btw
opened by NikEyX 11

Sac Discrete Error

Hi I'm trying to run SAC Discrete and I keep getting following error

Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error:
  File "results/Cart_Pole.py", line 144, in <module>
    trainer.run_games_for_agents()
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Trainer.py", line 79, in run_games_for_agents
    self.run_games_for_agent(agent_number + 1, agent_class)
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Trainer.py", line 117, in run_games_for_agent
    game_scores, rolling_scores, time_taken = agent.run_n_episodes()
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Base_Agent.py", line 189, in run_n_episodes
    self.step()
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC.py", line 87, in step
    self.learn()
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC.py", line 147, in learn
    policy_loss, log_pi = self.calculate_actor_loss(state_batch)
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC_Discrete.py", line 87, in calculate_actor_loss
    qf2_pi = self.critic_local_2(state_batch)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/nn_builder/pytorch/NN.py", line 119, in forward
    out = self.process_output_layers(x)
  File "/usr/local/lib/python3.6/dist-packages/nn_builder/pytorch/NN.py", line 163, in process_output_layers
    temp_output = output_layer(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1610, in linear
    ret = torch.addmm(bias, input, weight.t())
 (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:60)
Traceback (most recent call last):
  File "results/Cart_Pole.py", line 144, in <module>
    trainer.run_games_for_agents()
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Trainer.py", line 79, in run_games_for_agents
    self.run_games_for_agent(agent_number + 1, agent_class)
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Trainer.py", line 117, in run_games_for_agent
    game_scores, rolling_scores, time_taken = agent.run_n_episodes()
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Base_Agent.py", line 189, in run_n_episodes
    self.step()
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC.py", line 87, in step
    self.learn()
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC.py", line 150, in learn
    self.update_all_parameters(qf1_loss, qf2_loss, policy_loss, alpha_loss)
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC.py", line 192, in update_all_parameters
    self.hyperparameters["Actor"]["gradient_clipping_norm"])
  File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Base_Agent.py", line 283, in take_optimisation_step
    loss.backward(retain_graph=retain_graph) #this calculates the gradients
  File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 2]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Any thoughts?

opened by sshillo 4

Bug fix for SAC-discrete.
Hi, thank you for your great work :)

I fixed some bugs related to #54 and #56. I tested it in CartPole.py and saw the training converged more stably.

Changes

[x] fix min_qf_next_target to properly calculate expectations over policy.

[x] similarly fix policy_loss.

[x] fix max_probability_action to properly get argmax over each sample, not entire batch. (It doesn't actually affect the algorithm, though.)

[x] fix device for Replay_Memory to train SAC-Discrete on CPU.

[x] fix errors due to the update of PyTorch (now torch>=1.4.0 works!!).

Thanks :)
opened by toshikwa 3
suggestion

hi.thanks a lot for sharing your code its very nice and clean and readable. i want to run it in mountain car discrete environment, any suggestion for parameters or networks to get better results? regards

opened by m1996 3
Wrong temperature loss implementation for discrete SAC

In the discrete-SAC paper, the temperature loss in Eq. (11) indicates that the direct expectation should be calculated rather than the Monte-carlo estimate, the same logic as Eq. (10). The implementation however, still calls the calculate_entropy_tuning_loss in SAC.py using .mean().

opened by qiyan98 2
Dqn problem

Hi again I've taken a look at your dqn algorithm, i've heard that dqn has 2 nets,one for q and one for target.but i cannot find your target network...where is it? Does it exist? Regards

opened by m1996 2
DDPG: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

(rl) D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch>python results/Mountain_Car.py AGENT NAME: TD3 ?[1m1.1: TD3?[0m TITLE MountainCarContinuous {'Actor': {'learning_rate': 0.003, 'linear_hidden_units': [20, 20], 'final_layer_activation': None, 'batch_norm': False, 'tau': 0.005, 'gradient_clipping_norm': 5, 'initialiser': 'Xavier', 'output_activation': None, 'hidden_activations': 'relu', 'dropout': 0.0, 'columns_of_data_to_be_embedded': [], 'embedding_dimensions': [], 'y_range': ()}, 'Critic': {'learning_rate': 0.02, 'linear_hidden_units': [20, 20], 'final_layer_activation': None, 'batch_norm': False, 'buffer_size': 1000000, 'tau': 0.005, 'gradient_clipping_n orm': 5, 'initialiser': 'Xavier', 'output_activation': None, 'hidden_activations': 'relu', 'dropout': 0.0, 'columns_of_data_to_be_embedded': [], 'embedding_dimensions': [], 'y_range': ()}, 'min_steps_before_learning': 1000, 'batch_size': 256, 'discount_rate': 0 .99, 'mu': 0.0, 'theta': 0.15, 'sigma': 0.25, 'action_noise_std': 0.2, 'action_noise_clipping_range': 0.5, 'update_every_n_steps': 20, 'learning_updates_per_learning_session': 10, 'automatically_tune_entropy_hyperparameter': True, 'entropy_term_weight': None, ' add_extra_noise': True, 'do_evaluation_iterations': True, 'clip_rewards': False} RANDOM SEED 1187266124 Traceback (most recent call last): File "results/Mountain_Car.py", line 100, in trainer.run_games_for_agents() File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\Trainer.py", line 79, in run_games_for_agents self.run_games_for_agent(agent_number + 1, agent_class) File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\Trainer.py", line 117, in run_games_for_agent game_scores, rolling_scores, time_taken = agent.run_n_episodes() File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\Base_Agent.py", line 189, in run_n_episodes self.step() File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\actor_critic_agents\DDPG.py", line 40, in step self.critic_learn(states, actions, rewards, next_states, dones) File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\actor_critic_agents\TD3.py", line 36, in critic_learn critic_targets_next = self.compute_critic_values_for_next_states(next_states) File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\actor_critic_agents\TD3.py", line 27, in compute_critic_values_for_next_states actions_next = self.actor_target(next_states) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\nn_builder\pytorch\NN.py", line 118, in forward x = self.process_hidden_layers(x) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\nn_builder\pytorch\NN.py", line 153, in process_hidden_layers x = self.get_activation(self.hidden_activations, layer_ix)(linear_layer(x)) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\torch\nn\functional.py", line 1370, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

opened by MeixinZhu 2
Problems running the A3C algorithm

Hi and thanks for a great repo!

I have some problems running the A3C algorithm in Cart_Pole.py

I get the error: ...pytorch\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle _thread.RLock objects

pytorch\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

Do you know what can be the problem?

Br, Christofer

opened by christofer-f 2
An question about the actor loss calculation in `SAC_Discrete.py`.
Hi, thank you for your great work, it really helps a lot! I am wondering if you forgot to multiply the log_action_probabilities by self.alpha in line 83 of SAC_Discrete.py, which, I believe, should be revised as follow:

inside_term = self.alpha * log_action_probabilities - min_qf_pi

Please correct me if I am wrong about it.

Btw, could you please explain why should we calculate the SAC actor loss in discrete action situation as what you do in calculate_actor_loss of SAC_Discrete.py, to be more specifically, why should we multiply the inside_term described above by action_probabilities?

Thank you very much.
opened by ChangyWen 2
Results.X.py missing correct import path for Trainer

importing Training with from Trainer import Trainer

fails when running under eclipse pipenv

To be consistent wtih other imports modify to from Agents.Trainer import Trainer

opened by crashmatt 2
Fetch-Reach Result not running

Looks like there may be a commit missing for the FetchReach result. When I run it I get

KeyError: 'linear_hidden_units'

When I add the hyperparameter manually, an assertion error comes up

AssertionError: X should be a 2-dimensional tensor

opened by MishaLaskin 2
A question about critic-loss in discrete sac？

I applied the code of discrete sac to a custom discrete action environment. During the training process, I found that the loss of critic did not decrease but increased, and the critic-loss value after the increase was very large, even reaching 200+, what is the problem? Caused, how can I fix it? thanks.

opened by outshine-J 5
Bump numpy from 1.15.2 to 1.22.0
Bumps numpy from 1.15.2 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
A question about DQN_With_Fixed_Q_Targets.

According to the paper, the target network should be updated several steps after local network update, but your code seem to be not like this. In your code, the local network updates are followed by soft updates to the target network. I think there needs to be some time between local network updates and target network updates.

def learn(self, experiences=None): """Runs a learning iteration for the Q network""" super(DQN_With_Fixed_Q_Targets, self).learn(experiences=experiences) self.soft_update_of_target_network(self.q_network_local, self.q_network_target, self.hyperparameters["tau"])

opened by LLYYKK 0
KeyError: 'exploration_worker_difference'
four-rooms中，if name== 'main':

AGENTS = [A3C] #DIAYN] # DDQN] #SNN_HRL] #, DDQN] trainer = Trainer(config, AGENTS) trainer.run_games_for_agents()

调用A3C算法，报错：File "D:\Pycharm\test\Deep-Reinforcement-Learning-Algorithms-with-PyTorch-master\agents\actor_critic_agents\A3C.py", line 98, in init self.exploration_worker_difference = self.config.hyperparameters["exploration_worker_difference"] KeyError: 'exploration_worker_difference'
opened by HHY1123 0
Question on SAC implementation

In SAC.py Line 120 https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/b338c87bebb672e39304e47e0eed55aeb462b243/agents/actor_critic_agents/SAC.py#L120 However, the output of produce_action_and_action_info(state) is https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/b338c87bebb672e39304e47e0eed55aeb462b243/agents/actor_critic_agents/SAC.py#L135 So, even though SAC algorithm can work in practice, is it a mistake?

opened by fokx 0

PyTorch implementations of deep reinforcement learning algorithms and environments

Related tags

Overview

Deep Reinforcement Learning Algorithms with PyTorch

Algorithms Implemented

Environments Implemented

Results

1. Cart Pole and Mountain Car

2. Hindsight Experience Replay (HER) Experiements

3. Hierarchical Reinforcement Learning Experiments

Usage

i) To watch the agents learn the above games

ii) To train the agents on another game

Comments

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Owner

Petros Christodoulou

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Multi-objective gym environments for reinforcement learning.

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

PyTorch implementations of algorithms for density estimation

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Reinforcement learning framework and algorithms implemented in PyTorch.

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio