PyTorch implementations of deep reinforcement learning algorithms and environments

Overview

Deep Reinforcement Learning Algorithms with PyTorch

Travis CI contributions welcome

RL PyTorch

This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments.

(To help you remember things you learn about machine learning in general write them in Save All and try out the public deck there about Fast AI's machine learning textbook.)

Algorithms Implemented

  1. Deep Q Learning (DQN) (Mnih et al. 2013)
  2. DQN with Fixed Q Targets (Mnih et al. 2013)
  3. Double DQN (DDQN) (Hado van Hasselt et al. 2015)
  4. DDQN with Prioritised Experience Replay (Schaul et al. 2016)
  5. Dueling DDQN (Wang et al. 2016)
  6. REINFORCE (Williams et al. 1992)
  7. Deep Deterministic Policy Gradients (DDPG) (Lillicrap et al. 2016 )
  8. Twin Delayed Deep Deterministic Policy Gradients (TD3) (Fujimoto et al. 2018)
  9. Soft Actor-Critic (SAC) (Haarnoja et al. 2018)
  10. Soft Actor-Critic for Discrete Actions (SAC-Discrete) (Christodoulou 2019)
  11. Asynchronous Advantage Actor Critic (A3C) (Mnih et al. 2016)
  12. Syncrhonous Advantage Actor Critic (A2C)
  13. Proximal Policy Optimisation (PPO) (Schulman et al. 2017)
  14. DQN with Hindsight Experience Replay (DQN-HER) (Andrychowicz et al. 2018)
  15. DDPG with Hindsight Experience Replay (DDPG-HER) (Andrychowicz et al. 2018 )
  16. Hierarchical-DQN (h-DQN) (Kulkarni et al. 2016)
  17. Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) (Florensa et al. 2017)
  18. Diversity Is All You Need (DIAYN) (Eyensbach et al. 2018)

All implementations are able to quickly solve Cart Pole (discrete actions), Mountain Car Continuous (continuous actions), Bit Flipping (discrete actions with dynamic goals) or Fetch Reach (continuous actions with dynamic goals). I plan to add more hierarchical RL algorithms soon.

Environments Implemented

  1. Bit Flipping Game (as described in Andrychowicz et al. 2018)
  2. Four Rooms Game (as described in Sutton et al. 1998)
  3. Long Corridor Game (as described in Kulkarni et al. 2016)
  4. Ant-{Maze, Push, Fall} (as desribed in Nachum et al. 2018 and their accompanying code)

Results

1. Cart Pole and Mountain Car

Below shows various RL algorithms successfully learning discrete action game Cart Pole or continuous action game Mountain Car. The mean result from running the algorithms with 3 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters used can be found in files results/Cart_Pole.py and results/Mountain_Car.py.

Cart Pole and Mountain Car Results

2. Hindsight Experience Replay (HER) Experiements

Below shows the performance of DQN and DDPG with and without Hindsight Experience Replay (HER) in the Bit Flipping (14 bits) and Fetch Reach environments described in the papers Hindsight Experience Replay 2018 and Multi-Goal Reinforcement Learning 2018. The results replicate the results found in the papers and show how adding HER can allow an agent to solve problems that it otherwise would not be able to solve at all. Note that the same hyperparameters were used within each pair of agents and so the only difference between them was whether hindsight was used or not.

HER Experiment Results

3. Hierarchical Reinforcement Learning Experiments

The results on the left below show the performance of DQN and the algorithm hierarchical-DQN from Kulkarni et al. 2016 on the Long Corridor environment also explained in Kulkarni et al. 2016. The environment requires the agent to go to the end of a corridor before coming back in order to receive a larger reward. This delayed gratification and the aliasing of states makes it a somewhat impossible game for DQN to learn but if we introduce a meta-controller (as in h-DQN) which directs a lower-level controller how to behave we are able to make more progress. This aligns with the results found in the paper.

The results on the right show the performance of DDQN and algorithm Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) from Florensa et al. 2017. DDQN is used as the comparison because the implementation of SSN-HRL uses 2 DDQN algorithms within it. Note that the first 300 episodes of training for SNN-HRL were used for pre-training which is why there is no reward for those episodes.

Long Corridor and Four Rooms

Usage

The repository's high-level structure is:

├── agents                    
    ├── actor_critic_agents   
    ├── DQN_agents         
    ├── policy_gradient_agents
    └── stochastic_policy_search_agents 
├── environments   
├── results             
    └── data_and_graphs        
├── tests
├── utilities             
    └── data structures            

i) To watch the agents learn the above games

To watch all the different agents learn Cart Pole follow these steps:

git clone https://github.com/p-christ/Deep_RL_Implementations.git
cd Deep_RL_Implementations

conda create --name myenvname
y
conda activate myenvname

pip3 install -r requirements.txt

python results/Cart_Pole.py

For other games change the last line to one of the other files in the Results folder.

ii) To train the agents on another game

Most Open AI gym environments should work. All you would need to do is change the config.environment field (look at Results/Cart_Pole.py for an example of this).

You can also play with your own custom game if you create a separate class that inherits from gym.Env. See Environments/Four_Rooms_Environment.py for an example of a custom environment and then see the script Results/Four_Rooms.py to see how to have agents play the environment.

Comments
  • Mean of expectation in SAC_discrete.py possibly wrong?

    Mean of expectation in SAC_discrete.py possibly wrong?

    Hi Petros,

    in your SAC_discrete code you are using the following in SAC_Discrete.py:

    min_qf_next_target = action_probabilities * (torch.min(qf1_next_target, qf2_next_target) - self.alpha * log_action_probabilities)
    min_qf_next_target = min_qf_next_target.mean(dim=1).unsqueeze(-1)
    

    So it looks like you're effectively taking the mean of the probability-weighted Q-values in the last line? However, the weights are already provided by action_probabilities, so instead of doing .mean(dim=1) it should be .sum(dim=1) in order to give you the proper expectation of the next q value for each item in the batch. Or what am I missing here?

    Running this on LunarLander I also noted that the automatic entropy adjustment doesn't seem to work well. The original SAC paper mentions there is no "upper bound" in the calculation since the value should be tightly coupled to the constrained optimization problem. However, in your implementation the self.alpha value can go very high! I would have expected this value to be close / lower than the initial target_entropy level, which in the case of LunarLander is -log(1/4)*0.98 = 1.35, but I've seen it go to over 20! Do you have an explanation for that by chance?

    Thanks for your work, it's very appreciated btw

    opened by NikEyX 11
  • Sac Discrete Error

    Sac Discrete Error

    Hi I'm trying to run SAC Discrete and I keep getting following error

    Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error:
      File "results/Cart_Pole.py", line 144, in <module>
        trainer.run_games_for_agents()
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Trainer.py", line 79, in run_games_for_agents
        self.run_games_for_agent(agent_number + 1, agent_class)
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Trainer.py", line 117, in run_games_for_agent
        game_scores, rolling_scores, time_taken = agent.run_n_episodes()
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Base_Agent.py", line 189, in run_n_episodes
        self.step()
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC.py", line 87, in step
        self.learn()
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC.py", line 147, in learn
        policy_loss, log_pi = self.calculate_actor_loss(state_batch)
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC_Discrete.py", line 87, in calculate_actor_loss
        qf2_pi = self.critic_local_2(state_batch)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/nn_builder/pytorch/NN.py", line 119, in forward
        out = self.process_output_layers(x)
      File "/usr/local/lib/python3.6/dist-packages/nn_builder/pytorch/NN.py", line 163, in process_output_layers
        temp_output = output_layer(x)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py", line 87, in forward
        return F.linear(input, self.weight, self.bias)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1610, in linear
        ret = torch.addmm(bias, input, weight.t())
     (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:60)
    Traceback (most recent call last):
      File "results/Cart_Pole.py", line 144, in <module>
        trainer.run_games_for_agents()
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Trainer.py", line 79, in run_games_for_agents
        self.run_games_for_agent(agent_number + 1, agent_class)
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Trainer.py", line 117, in run_games_for_agent
        game_scores, rolling_scores, time_taken = agent.run_n_episodes()
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Base_Agent.py", line 189, in run_n_episodes
        self.step()
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC.py", line 87, in step
        self.learn()
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC.py", line 150, in learn
        self.update_all_parameters(qf1_loss, qf2_loss, policy_loss, alpha_loss)
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/actor_critic_agents/SAC.py", line 192, in update_all_parameters
        self.hyperparameters["Actor"]["gradient_clipping_norm"])
      File "/home/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/agents/Base_Agent.py", line 283, in take_optimisation_step
        loss.backward(retain_graph=retain_graph) #this calculates the gradients
      File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 198, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 100, in backward
        allow_unreachable=True)  # allow_unreachable flag
    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 2]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
    

    Any thoughts?

    opened by sshillo 4
  • Bug fix for SAC-discrete.

    Bug fix for SAC-discrete.

    Hi, thank you for your great work :)

    I fixed some bugs related to #54 and #56. I tested it in CartPole.py and saw the training converged more stably.

    Changes

    • [x] fix min_qf_next_target to properly calculate expectations over policy.
    • [x] similarly fix policy_loss.
    • [x] fix max_probability_action to properly get argmax over each sample, not entire batch. (It doesn't actually affect the algorithm, though.)
    • [x] fix device for Replay_Memory to train SAC-Discrete on CPU.
    • [x] fix errors due to the update of PyTorch (now torch>=1.4.0 works!!).

    Thanks :)

    opened by toshikwa 3
  • suggestion

    suggestion

    hi.thanks a lot for sharing your code its very nice and clean and readable. i want to run it in mountain car discrete environment, any suggestion for parameters or networks to get better results? regards

    opened by m1996 3
  • Wrong temperature loss implementation for discrete SAC

    Wrong temperature loss implementation for discrete SAC

    In the discrete-SAC paper, the temperature loss in Eq. (11) indicates that the direct expectation should be calculated rather than the Monte-carlo estimate, the same logic as Eq. (10). The implementation however, still calls the calculate_entropy_tuning_loss in SAC.py using .mean().

    opened by qiyan98 2
  • Dqn problem

    Dqn problem

    Hi again I've taken a look at your dqn algorithm, i've heard that dqn has 2 nets,one for q and one for target.but i cannot find your target network...where is it? Does it exist? Regards

    opened by m1996 2
  • DDPG: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

    DDPG: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

    (rl) D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch>python results/Mountain_Car.py AGENT NAME: TD3 ?[1m1.1: TD3?[0m TITLE MountainCarContinuous {'Actor': {'learning_rate': 0.003, 'linear_hidden_units': [20, 20], 'final_layer_activation': None, 'batch_norm': False, 'tau': 0.005, 'gradient_clipping_norm': 5, 'initialiser': 'Xavier', 'output_activation': None, 'hidden_activations': 'relu', 'dropout': 0.0, 'columns_of_data_to_be_embedded': [], 'embedding_dimensions': [], 'y_range': ()}, 'Critic': {'learning_rate': 0.02, 'linear_hidden_units': [20, 20], 'final_layer_activation': None, 'batch_norm': False, 'buffer_size': 1000000, 'tau': 0.005, 'gradient_clipping_n orm': 5, 'initialiser': 'Xavier', 'output_activation': None, 'hidden_activations': 'relu', 'dropout': 0.0, 'columns_of_data_to_be_embedded': [], 'embedding_dimensions': [], 'y_range': ()}, 'min_steps_before_learning': 1000, 'batch_size': 256, 'discount_rate': 0 .99, 'mu': 0.0, 'theta': 0.15, 'sigma': 0.25, 'action_noise_std': 0.2, 'action_noise_clipping_range': 0.5, 'update_every_n_steps': 20, 'learning_updates_per_learning_session': 10, 'automatically_tune_entropy_hyperparameter': True, 'entropy_term_weight': None, ' add_extra_noise': True, 'do_evaluation_iterations': True, 'clip_rewards': False} RANDOM SEED 1187266124 Traceback (most recent call last): File "results/Mountain_Car.py", line 100, in trainer.run_games_for_agents() File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\Trainer.py", line 79, in run_games_for_agents self.run_games_for_agent(agent_number + 1, agent_class) File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\Trainer.py", line 117, in run_games_for_agent game_scores, rolling_scores, time_taken = agent.run_n_episodes() File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\Base_Agent.py", line 189, in run_n_episodes self.step() File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\actor_critic_agents\DDPG.py", line 40, in step self.critic_learn(states, actions, rewards, next_states, dones) File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\actor_critic_agents\TD3.py", line 36, in critic_learn critic_targets_next = self.compute_critic_values_for_next_states(next_states) File "D:\Downloads\Deep-Reinforcement-Learning-Algorithms-with-PyTorch\agents\actor_critic_agents\TD3.py", line 27, in compute_critic_values_for_next_states actions_next = self.actor_target(next_states) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\nn_builder\pytorch\NN.py", line 118, in forward x = self.process_hidden_layers(x) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\nn_builder\pytorch\NN.py", line 153, in process_hidden_layers x = self.get_activation(self.hidden_activations, layer_ix)(linear_layer(x)) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "C:\ProgramData\Anaconda3\envs\rl\lib\site-packages\torch\nn\functional.py", line 1370, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

    opened by MeixinZhu 2
  • Problems running the A3C algorithm

    Problems running the A3C algorithm

    Hi and thanks for a great repo!

    I have some problems running the A3C algorithm in Cart_Pole.py

    I get the error: ...pytorch\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle _thread.RLock objects

    pytorch\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

    Do you know what can be the problem?

    Br, Christofer

    opened by christofer-f 2
  • An question about the actor loss calculation in `SAC_Discrete.py`.

    An question about the actor loss calculation in `SAC_Discrete.py`.

    Hi, thank you for your great work, it really helps a lot! I am wondering if you forgot to multiply the log_action_probabilities by self.alpha in line 83 of SAC_Discrete.py, which, I believe, should be revised as follow:

    inside_term = self.alpha * log_action_probabilities - min_qf_pi
    

    Please correct me if I am wrong about it.

    Btw, could you please explain why should we calculate the SAC actor loss in discrete action situation as what you do in calculate_actor_loss of SAC_Discrete.py, to be more specifically, why should we multiply the inside_term described above by action_probabilities?

    Thank you very much.

    opened by ChangyWen 2
  • Results.X.py missing correct import path for Trainer

    Results.X.py missing correct import path for Trainer

    importing Training with from Trainer import Trainer

    fails when running under eclipse pipenv

    To be consistent wtih other imports modify to from Agents.Trainer import Trainer

    opened by crashmatt 2
  • Fetch-Reach Result not running

    Fetch-Reach Result not running

    Looks like there may be a commit missing for the FetchReach result. When I run it I get

    KeyError: 'linear_hidden_units'

    When I add the hyperparameter manually, an assertion error comes up

    AssertionError: X should be a 2-dimensional tensor

    opened by MishaLaskin 2
  • A question about critic-loss in discrete sac?

    A question about critic-loss in discrete sac?

    I applied the code of discrete sac to a custom discrete action environment. During the training process, I found that the loss of critic did not decrease but increased, and the critic-loss value after the increase was very large, even reaching 200+, what is the problem? Caused, how can I fix it? thanks.

    opened by outshine-J 5
  • Bump numpy from 1.15.2 to 1.22.0

    Bump numpy from 1.15.2 to 1.22.0

    Bumps numpy from 1.15.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • A question about DQN_With_Fixed_Q_Targets.

    A question about DQN_With_Fixed_Q_Targets.

    According to the paper, the target network should be updated several steps after local network update, but your code seem to be not like this. In your code, the local network updates are followed by soft updates to the target network. I think there needs to be some time between local network updates and target network updates.

    def learn(self, experiences=None): """Runs a learning iteration for the Q network""" super(DQN_With_Fixed_Q_Targets, self).learn(experiences=experiences) self.soft_update_of_target_network(self.q_network_local, self.q_network_target, self.hyperparameters["tau"])

    opened by LLYYKK 0
  • KeyError: 'exploration_worker_difference'

    KeyError: 'exploration_worker_difference'

    four-rooms中,if name== 'main':

    AGENTS = [A3C] #DIAYN] # DDQN] #SNN_HRL] #, DDQN]
    trainer = Trainer(config, AGENTS)
    trainer.run_games_for_agents()
    

    调用A3C算法,报错:File "D:\Pycharm\test\Deep-Reinforcement-Learning-Algorithms-with-PyTorch-master\agents\actor_critic_agents\A3C.py", line 98, in init self.exploration_worker_difference = self.config.hyperparameters["exploration_worker_difference"] KeyError: 'exploration_worker_difference'

    opened by HHY1123 0
  • Question on SAC implementation

    Question on SAC implementation

    In SAC.py Line 120 https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/b338c87bebb672e39304e47e0eed55aeb462b243/agents/actor_critic_agents/SAC.py#L120 However, the output of produce_action_and_action_info(state) is https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/b338c87bebb672e39304e47e0eed55aeb462b243/agents/actor_critic_agents/SAC.py#L135 So, even though SAC algorithm can work in practice, is it a mistake?

    opened by fokx 0
Owner
Petros Christodoulou
Petros Christodoulou
Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

PyTorch RL Minimal Implementations There are implementations of some reinforcement learning algorithms, whose characteristics are as follow: Less pack

Gemini Light 4 Dec 31, 2022
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 1, 2023
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

Erik Linder-Norén 21.8k Jan 9, 2023
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

Dynamic Systems Lab 300 Dec 28, 2022
CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

Facebook Research 721 Jan 3, 2023
gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI Gym toolkit.

Robin Henry 99 Dec 12, 2022
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 49 Nov 28, 2022
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 3, 2023
Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

Li Shengyan 270 Dec 31, 2022
deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

null 63 Oct 17, 2022
PyTorch implementations of algorithms for density estimation

pytorch-flows A PyTorch implementations of Masked Autoregressive Flow and some other invertible transformations from Glow: Generative Flow with Invert

Ilya Kostrikov 546 Dec 5, 2022
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022
Reinforcement learning framework and algorithms implemented in PyTorch.

Reinforcement learning framework and algorithms implemented in PyTorch.

Robotic AI & Learning Lab Berkeley 2.1k Jan 4, 2023
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 7, 2022
Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

Manav Mishra 4 Apr 15, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

null 195 Dec 7, 2022