Reinforcement learning framework and algorithms implemented in PyTorch.

Related tags

Deep Learning rlkit
Overview

RLkit

Reinforcement learning framework and algorithms implemented in PyTorch.

Implemented algorithms:

To get started, checkout the example scripts, linked above.

What's New

Version 0.2

04/25/2019

  • Use new multiworld code that requires explicit environment registration.
  • Make installation easier by adding setup.py and using default conf.py.

04/16/2019

  • Log how many train steps were called
  • Log env_info and agent_info.

04/05/2019-04/15/2019

  • Add rendering
  • Fix SAC bug to account for future entropy (#41, #43)
  • Add online algorithm mode (#42)

04/05/2019

The initial release for 0.2 has the following major changes:

  • Remove Serializable class and use default pickle scheme.
  • Remove PyTorchModule class and use native torch.nn.Module directly.
  • Switch to batch-style training rather than online training.
    • Makes code more amenable to parallelization.
    • Implementing the online-version is straightforward.
  • Refactor training code to be its own object, rather than being integrated inside of RLAlgorithm.
  • Refactor sampling code to be its own object, rather than being integrated inside of RLAlgorithm.
  • Implement Skew-Fit: State-Covering Self-Supervised Reinforcement Learning, a method for performing goal-directed exploration to maximize the entropy of visited states.
  • Update soft actor-critic to more closely match TensorFlow implementation:
    • Rename TwinSAC to just SAC.
    • Only have Q networks.
    • Remove unnecessary policy regualization terms.
    • Use numerically stable Jacobian computation.

Overall, the refactors are intended to make the code more modular and readable than the previous versions.

Version 0.1

12/04/2018

  • Add RIG implementation

12/03/2018

  • Add HER implementation
  • Add doodad support

10/16/2018

  • Upgraded to PyTorch v0.4
  • Added Twin Soft Actor Critic Implementation
  • Various small refactor (e.g. logger, evaluate code)

Installation

  1. Install and use the included Ananconda environment
$ conda env create -f environment/[linux-cpu|linux-gpu|mac]-env.yml
$ source activate rlkit
(rlkit) $ python examples/ddpg.py

Choose the appropriate .yml file for your system. These Anaconda environments use MuJoCo 1.5 and gym 0.10.5. You'll need to get your own MuJoCo key if you want to use MuJoCo.

  1. Add this repo directory to your PYTHONPATH environment variable or simply run:
pip install -e .
  1. (Optional) Copy conf.py to conf_private.py and edit to override defaults:
cp rlkit/launchers/conf.py rlkit/launchers/conf_private.py
  1. (Optional) If you plan on running the Skew-Fit experiments or the HER example with the Sawyer environment, then you need to install multiworld.

DISCLAIMER: the mac environment has only been tested without a GPU.

For an even more portable solution, try using the docker image provided in environment/docker. The Anaconda env should be enough, but this docker image addresses some of the rendering issues that may arise when using MuJoCo 1.5 and GPUs. The docker image supports GPU, but it should work without a GPU. To use a GPU with the image, you need to have nvidia-docker installed.

Using a GPU

You can use a GPU by calling

import rlkit.torch.pytorch_util as ptu
ptu.set_gpu_mode(True)

before launching the scripts.

If you are using doodad (see below), simply use the use_gpu flag:

run_experiment(..., use_gpu=True)

Visualizing a policy and seeing results

During training, the results will be saved to a file called under

LOCAL_LOG_DIR/
   
    /
    

    
   
  • LOCAL_LOG_DIR is the directory set by rlkit.launchers.config.LOCAL_LOG_DIR. Default name is 'output'.
  • is given either to setup_logger.
  • is auto-generated and based off of exp_prefix.
  • inside this folder, you should see a file called params.pkl. To visualize a policy, run
(rlkit) $ python scripts/run_policy.py LOCAL_LOG_DIR/
   
    /
    
     /params.pkl

    
   

or

(rlkit) $ python scripts/run_goal_conditioned_policy.py LOCAL_LOG_DIR/
   
    /
    
     /params.pkl

    
   

depending on whether or not the policy is goal-conditioned.

If you have rllab installed, you can also visualize the results using rllab's viskit, described at the bottom of this page

tl;dr run

python rllab/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/

to visualize all experiments with a prefix of exp_prefix. To only visualize a single run, you can do

python rllab/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/<folder name>

Alternatively, if you don't want to clone all of rllab, a repository containing only viskit can be found here. You can similarly visualize results with.

python viskit/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/

This viskit repo also has a few extra nice features, like plotting multiple Y-axis values at once, figure-splitting on multiple keys, and being able to filter hyperparametrs out.

Visualizing a goal-conditioned policy

To visualize a goal-conditioned policy, run

(rlkit) $ python scripts/run_goal_conditioned_policy.py
LOCAL_LOG_DIR/
   
    /
    
     /params.pkl

    
   

Launching jobs with doodad

The run_experiment function makes it easy to run Python code on Amazon Web Services (AWS) or Google Cloud Platform (GCP) by using this fork of doodad.

It's as easy as:

from rlkit.launchers.launcher_util import run_experiment

def function_to_run(variant):
    learning_rate = variant['learning_rate']
    ...

run_experiment(
    function_to_run,
    exp_prefix="my-experiment-name",
    mode='ec2',  # or 'gcp'
    variant={'learning_rate': 1e-3},
)

You will need to set up parameters in config.py (see step one of Installation). This requires some knowledge of AWS and/or GCP, which is beyond the scope of this README. To learn more, more about doodad, go to the repository, which is based on this original repository.

Requests for pull-requests

  • Implement policy-gradient algorithms.
  • Implement model-based algorithms.

Legacy Code (v0.1.2)

For Temporal Difference Models (TDMs) and the original implementation of Reinforcement Learning with Imagined Goals (RIG), run git checkout tags/v0.1.2.

References

The algorithms are based on the following papers

Offline Meta-Reinforcement Learning with Online Self-Supervision Vitchyr H. Pong, Ashvin Nair, Laura Smith, Catherine Huang, Sergey Levine. arXiv preprint, 2021.

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. Vitchyr H. Pong*, Murtaza Dalal*, Steven Lin*, Ashvin Nair, Shikhar Bahl, Sergey Levine. ICML, 2020.

Visual Reinforcement Learning with Imagined Goals. Ashvin Nair*, Vitchyr Pong*, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine. NeurIPS 2018.

Temporal Difference Models: Model-Free Deep RL for Model-Based Control. Vitchyr Pong*, Shixiang Gu*, Murtaza Dalal, Sergey Levine. ICLR 2018.

Hindsight Experience Replay. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba. NeurIPS 2017.

Deep Reinforcement Learning with Double Q-learning. Hado van Hasselt, Arthur Guez, David Silver. AAAI 2016.

Human-level control through deep reinforcement learning. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis. Nature 2015.

Soft Actor-Critic Algorithms and Applications. Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine. arXiv preprint, 2018.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. ICML, 2018.

Addressing Function Approximation Error in Actor-Critic Methods Scott Fujimoto, Herke van Hoof, David Meger. ICML, 2018.

Credits

This repository was initially developed primarily by Vitchyr Pong, until July 2021, at which point it was transferred to the RAIL Berkeley organization and is primarily maintained by Ashvin Nair. Other major collaborators and contributions:

A lot of the coding infrastructure is based on rllab. The serialization and logger code are basically a carbon copy of the rllab versions.

The Dockerfile is based on the OpenAI mujoco-py Dockerfile.

The SMAC code builds off of the PEARL code, which built off of an older RLKit version.

Comments
  • Performance on Hopper-v2

    Performance on Hopper-v2

    originally posted under another issue but re-posting for visibility

    Sorry to open this up again, but I am unable to obtain comparable result to the Tensorflow implementation using the master branch. I post the training graph for the pytorch and tensorflow implementation below for comparison. Both results were averaged over 5 seeds.

    TSAC Hopper-v2 Pytorch

    TSAC Hopper-v2 TF

    The TF implementation's final performance is higher and also learns faster. The shape of the TF implementation also closely matches the shape of the graph in the paper, i.e. quickly increase and plateau at around 400 epochs.

    Does the pytorch graph look similar to what you obtained too?

    I just want to mention that your repo is awesome. Answering pestering question from me is not your responsibility : ) and I really appreciate any help here.

    opened by quanvuong 11
  • about cuda error during replay weight

    about cuda error during replay weight

    Hi there, I am trying to train sac with her. After get the expert weight, when I try to visualize the policy, it turns out error that : Policy and environment loaded Traceback (most recent call last): File "scripts/run_goal_conditioned_policy.py", line 60, in simulate_policy(args) File "scripts/run_goal_conditioned_policy.py", line 15, in simulate_policy policy = data['trainer/policy'] File "/home/yunchuz/rlkit/rlkit/samplers/rollout_functions.py", line 41, in multitask_rollout a, agent_info = agent.get_action(new_obs, **get_action_kwargs) File "/home/yunchuz/rlkit/rlkit/torch/sac/policies.py", line 63, in get_action actions = self.get_actions(obs_np[None], deterministic=deterministic) File "/home/yunchuz/rlkit/rlkit/torch/sac/policies.py", line 67, in get_actions return eval_np(self, obs_np, deterministic=deterministic)[0] File "/home/yunchuz/rlkit/rlkit/torch/core.py", line 18, in eval_np outputs = module(*torch_args, **torch_kwargs) File "/home/yunchuz/miniconda3/envs/rlkit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/yunchuz/rlkit/rlkit/torch/sac/policies.py", line 83, in forward h = self.hidden_activation(fc(h)) File "/home/yunchuz/miniconda3/envs/rlkit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/yunchuz/miniconda3/envs/rlkit/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "/home/yunchuz/miniconda3/envs/rlkit/lib/python3.6/site-packages/torch/nn/functional.py", line 1370, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm

    Do you know what is the result for it?

    opened by YunchuZhang 9
  • [Question] Any idea why SAC loss would diverge?

    [Question] Any idea why SAC loss would diverge?

    I left it running for a few epochs, several times to ensure that it was not a fluke.

    And SAC is collapsing to always choose the same action.

    replay_buffer/size                       210000
    trainer/QF1 Loss                              1.35779e+19
    trainer/QF2 Loss                              1.34288e+19
    trainer/Policy Loss                          -2.48799e+10
    trainer/Q1 Predictions Mean                   2.33888e+10
    trainer/Q1 Predictions Std                    3.70217e+09
    trainer/Q1 Predictions Max                    3.68046e+10
    trainer/Q1 Predictions Min                    1.31057e+10
    trainer/Q2 Predictions Mean                   2.34333e+10
    trainer/Q2 Predictions Std                    3.65296e+09
    trainer/Q2 Predictions Max                    3.66932e+10
    trainer/Q2 Predictions Min                    1.33272e+10
    trainer/Q Targets Mean                        2.36857e+10
    trainer/Q Targets Std                         4.52467e+09
    trainer/Q Targets Max                         3.54759e+10
    trainer/Q Targets Min                         0.224346
    trainer/Log Pis Mean                          0.987727
    trainer/Log Pis Std                           1.12239
    trainer/Log Pis Max                           2.15324
    trainer/Log Pis Min                          -4.0056
    trainer/Policy mu Mean                        1.52476
    trainer/Policy mu Std                         0.0895151
    trainer/Policy mu Max                         1.62818
    trainer/Policy mu Min                         1.37598
    trainer/Policy log std Mean                  -0.582497
    trainer/Policy log std Std                    0.0243203
    trainer/Policy log std Max                   -0.492316
    trainer/Policy log std Min                   -0.640244
    trainer/Alpha                                 5.56742e+08
    trainer/Alpha Loss                            0.247146
    exploration/num steps total                   2.491e+06
    exploration/num paths total               23586
    exploration/path length Mean                131.579
    exploration/path length Std                  57.1612
    exploration/path length Max                 200
    exploration/path length Min                   8
    exploration/Rewards Mean                      0.264324
    exploration/Rewards Std                       0.149922
    exploration/Rewards Max                       0.590382
    exploration/Rewards Min                       0.0141083
    exploration/Returns Mean                     34.7795
    exploration/Returns Std                      23.3818
    exploration/Returns Max                      83.2558
    exploration/Returns Min                       2.15501
    exploration/Actions Mean                      0.4906
    exploration/Actions Std                       0.0686414
    exploration/Actions Max                       0.5
    exploration/Actions Min                      -0.5
    exploration/Num Paths                        38
    exploration/Average Returns                  34.7795
    exploration/env_infos/final/time Mean         0.342105
    exploration/env_infos/final/time Std          0.285806
    exploration/env_infos/final/time Max          0.96
    exploration/env_infos/final/time Min          0
    exploration/env_infos/initial/time Mean       0.995
    exploration/env_infos/initial/time Std        3.33067e-16
    exploration/env_infos/initial/time Max        0.995
    exploration/env_infos/initial/time Min        0.995
    exploration/env_infos/time Mean               0.606472
    exploration/env_infos/time Std                0.263458
    exploration/env_infos/time Max                0.995
    exploration/env_infos/time Min                0
    evaluation/num steps total                    2.45463e+06
    evaluation/num paths total                21675
    evaluation/path length Mean                 115.452
    evaluation/path length Std                   52.2554
    evaluation/path length Max                  200
    evaluation/path length Min                    9
    evaluation/Rewards Mean                       0.248655
    evaluation/Rewards Std                        0.0242211
    evaluation/Rewards Max                        0.294154
    evaluation/Rewards Min                        0.193703
    evaluation/Returns Mean                      28.7078
    evaluation/Returns Std                       12.9204
    evaluation/Returns Max                       52.5658
    evaluation/Returns Min                        2.53809
    evaluation/Actions Mean                       0.5
    evaluation/Actions Std                        0
    evaluation/Actions Max                        0.5
    evaluation/Actions Min                        0.5
    evaluation/Num Paths                         42
    evaluation/Average Returns                   28.7078
    evaluation/env_infos/final/time Mean          0.422738
    evaluation/env_infos/final/time Std           0.261277
    evaluation/env_infos/final/time Max           0.955
    evaluation/env_infos/final/time Min           0
    evaluation/env_infos/initial/time Mean        0.995
    evaluation/env_infos/initial/time Std         2.22045e-16
    evaluation/env_infos/initial/time Max         0.995
    evaluation/env_infos/initial/time Min         0.995
    evaluation/env_infos/time Mean                0.64974
    evaluation/env_infos/time Std                 0.245087
    evaluation/env_infos/time Max                 0.995
    evaluation/env_infos/time Min                 0
    time/data storing (s)                         0.0476881
    time/evaluation sampling (s)                 13.4834
    time/exploration sampling (s)                15.2477
    time/logging (s)                              0.0254512
    time/saving (s)                               0.0218989
    time/training (s)                           111.327
    time/epoch (s)                              140.153
    time/total (s)                            68869.7
    Epoch                                       497
    

    Running it from master. Could it be related to the 'action' state to be somewhat discrete? The environment will discretize the actions in 'x' states based on the input data.

    opened by redknightlois 8
  • SAC HER example results not matching

    SAC HER example results not matching

    I cloned the repo, setup the environment and ran (made no changes)

    python her_sac_gym_fetch_reach.py

    The results don't seem to match with this. Did something break in the latest commit?

    image

    However, when I try the td3, it works fine

    python her_td3_multiworld_sawyer_reach.py

    image

    opened by Shade5 7
  • Loading checkpoint trained on GPU on a CPU device fails.

    Loading checkpoint trained on GPU on a CPU device fails.

    It seems that the way model parameters are saved in rlkit/core/logging.py using pickle.dump results in checkpoints, which are not recoverable on a computer without a GPU, if the checkpoint was trained on a GPU. Loading parameters of a SAC model trained on GPU using scripts/run_policy.py results in the same error message as in this pytorch issue. I tried the different map_location arguments from that issue but they did not fix the problem for me.

    Changing pickle.dump into torch.save fixes the problem in my case. Not sure if that change has some side effects elsewhere.

    Verified this happens on commit c138bae3b3904c25de2c37c950e315410b3c0b99

    opened by vuoristo 6
  • Unable to reproduce the results of HER-TD3

    Unable to reproduce the results of HER-TD3

    When I first run the script with:

    python -m examples.her.her_td3_gym_fetch_reach

    It raised an error:

    Traceback (most recent call last):
      File "/home/fcloud/anaconda3/envs/rlkit/lib/python3.5/runpy.py", line 184, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/fcloud/anaconda3/envs/rlkit/lib/python3.5/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/fcloud/workspace/rlkit/examples/her/her_td3_gym_fetch_reach.py", line 88, in <module>
        experiment(variant)
      File "/home/fcloud/workspace/rlkit/examples/her/her_td3_gym_fetch_reach.py", line 65, in experiment
        **variant['algo_kwargs']
    TypeError: __init__() missing 2 required keyword-only arguments: 'her_kwargs' and 'td3_kwargs'
    

    So I modified the source code as follows:

        her_kwargs = dict(observation_key='observation', desired_goal_key='desired_goal')
        td3_kwargs = dict(env=env,
                          qf1=qf1,
                          qf2=qf2,
                          policy=policy,
                          exploration_policy=exploration_policy)
        algorithm = HerTd3(
            her_kwargs=her_kwargs,
            td3_kwargs=td3_kwargs,
            replay_buffer=replay_buffer,
            **variant['algo_kwargs']
        )
    

    Then the experiment can be launched successfully, but the results seem not correct: newplot

    bug 
    opened by charliezon 6
  • Mujoco_py doesn't build

    Mujoco_py doesn't build

    Docker image installs mujoco_py successfully. I can run a new Docker container, type "import mujoco_py" and it builds the cython code.

    However. When I launch a container programmatically with doodad. For some reason, it keeps erroring and trying to rebuild the cython:

    Running in docker

    Import error. Trying to rebuild mujoco_py.
    running build_ext
    building 'mujoco_py.cymj' extension
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py -I/root/.mujoco/mjpro150/include -I/env/lib/python3.6/site-packages/numpy/core/include -I/usr/include/python3.6m -I/env/include/python3.6m -c /env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/cymj.c -o /env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/generated/_pyxbld_1.50.1.68_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/cymj.o -fopenmp -w
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py -I/root/.mujoco/mjpro150/include -I/env/lib/python3.6/site-packages/numpy/core/include -I/usr/include/python3.6m -I/env/include/python3.6m -c /env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/gl/osmesashim.c -o /env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/generated/_pyxbld_1.50.1.68_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/gl/osmesashim.o -fopenmp -w
    x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 /env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/generated/_pyxbld_1.50.1.68_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/cymj.o /env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/generated/_pyxbld_1.50.1.68_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/gl/osmesashim.o -L/root/.mujoco/mjpro150/bin -Wl,--enable-new-dtags,-R/root/.mujoco/mjpro150/bin -lmujoco150 -lglewosmesa -lOSMesa -lGL -o /env/lib/python3.6/site-packages/mujoco_py-1.50.1.68-py3.6.egg/mujoco_py/generated/_pyxbld_1.50.1.68_36_linuxcpuextensionbuilder/lib.linux-x86_64-3.6/mujoco_py/cymj.cpython-36m-x86_64-linux-gnu.so -fopenmp
    /usr/bin/ld: cannot find -lmujoco150
    /usr/bin/ld: cannot find -lglewosmesa
    
    

    I'm sure osmesa is installed, and mujoco is downloaded and installed. The LD path looks correct: echo $LD_LIBRARY_PATH /usr/local/nvidia/lib64:/root/.mujoco/mjpro150/bin:/usr/local/nvidia/lib:/usr/local/nvidia/lib64

    root@e6131e04a34c:~/.mujoco# ls mjkey.txt mjpro150 
    root@e6131e04a34c:~/.mujoco# ls mjpro150/ 
    bin doc include model sample
    
    opened by richardrl 6
  • Incomplete relabelling of trajectories in HER

    Incomplete relabelling of trajectories in HER

    Hi,

    First, thanks a lot for releasing such a nice repository. I have been using it for a few months now and I appreciate that it is quite well written and most of the code is self explanatory. I learned a lot just by using it.

    I am using SAC-HER and got a lot of divergence issues which I fixed in the end. One of the main problem came from the relabelling of samples in the buffer: https://github.com/vitchyr/rlkit/blob/90195b24604f513403e4d0fe94db372d16700523/rlkit/data_management/obs_dict_replay_buffer.py#L228-L239 Given a batch, a set of new rewards is computed according to the updated set of goals. However the terminals are not updated, whereas some states might be terminal given the new goal.

    And this has its importance in the Bellman update where the terminals variable appear https://github.com/vitchyr/rlkit/blob/90195b24604f513403e4d0fe94db372d16700523/rlkit/torch/sac/sac.py#L128

    If the reward if of the form -distance(state, goal) and an episode is only finished because of the maximum path length, then not updating the terminals will have little impact. It may be why this bug passed silently. However I am working with a spare reward which is 1 if distance(state, goal) < epsilon. And in this case, if terminals are not updated then the Q-function blows up. Indeed, if we assume that target_q_values = 1 at the goal, if terminals = 0 then q_target = 2, at the next iteration q_target = 3 and so on. If terminals = 1, i.e. if the state is terminal according to the resampled goal then q_target = 1.

    So in my fork of your repository, I replaced:

    new_rewards = self.env.compute_rewards(
                    new_actions,
                    new_next_obs_dict,
                )
    

    by

    new_rewards, new_terminals = self.env.compute_rewards(
                    new_actions,
                    new_next_obs_dict,
                )
    

    where terminals is 1 if distance(state, goal) < epsilon in my case. This fixed the Q-function blow-up issue.

    opened by rstrudel 5
  • Make sure you dont need a Mujoco license to use any of the algorithms

    Make sure you dont need a Mujoco license to use any of the algorithms

    There are many algorithms that import Mujoco environments because they are not separated. In my case I dont care about Mujoco, in fact I had to get a trial license just to avoid having to remove code from my fork.

    opened by redknightlois 5
  • Could someone provide right environment installation procedure?

    Could someone provide right environment installation procedure?

    Building wheel for mujoco-py (PEP 517) ... error ERROR: Command errored out with exit status 1: command: /root/miniconda3/bin/python /root/miniconda3/lib/python3.7/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpak70wvv9 cwd: /tmp/pip-install-d7en4dsl/mujoco-py_6a6786af6c064379a0f7b6cb3aa0da31 Complete output (71 lines): running bdist_wheel running build

    You appear to be missing MuJoCo. We expected to find the file here: /root/.mujoco/mujoco200

    This package only provides python bindings, the library must be installed separately.

    Please follow the instructions on the README to install MuJoCo

      https://github.com/openai/mujoco-py#install-mujoco
    

    Which can be downloaded from the website

      https://www.roboti.us/index.html
    

    Traceback (most recent call last): File "/root/miniconda3/lib/python3.7/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 280, in main() File "/root/miniconda3/lib/python3.7/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 263, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/root/miniconda3/lib/python3.7/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 205, in build_wheel metadata_directory) File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 414, in build_wheel wheel_directory, config_settings) File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 398, in _build_with_temp_dir self.run_setup() File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 485, in run_setup self).run_setup(setup_script=setup_script) File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 335, in run_setup exec(code, locals()) File "", line 51, in File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/init.py", line 87, in setup return distutils.core.setup(**attrs) File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/tmp/pip-build-env-_0ll4r3f/normal/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 325, in run self.run_command("build") File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/tmp/pip-build-env-_0ll4r3f/overlay/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "", line 29, in run File "/tmp/pip-install-d7en4dsl/mujoco-py_6a6786af6c064379a0f7b6cb3aa0da31/mujoco_py/init.py", line 3, in from mujoco_py.builder import cymj, ignore_mujoco_warnings, functions, MujocoException File "/tmp/pip-install-d7en4dsl/mujoco-py_6a6786af6c064379a0f7b6cb3aa0da31/mujoco_py/builder.py", line 509, in mujoco_path, key_path = discover_mujoco() File "/tmp/pip-install-d7en4dsl/mujoco-py_6a6786af6c064379a0f7b6cb3aa0da31/mujoco_py/utils.py", line 93, in discover_mujoco raise Exception(message) Exception: You appear to be missing MuJoCo. We expected to find the file here: /root/.mujoco/mujoco200

    This package only provides python bindings, the library must be installed separately.

    Please follow the instructions on the README to install MuJoCo

      https://github.com/openai/mujoco-py#install-mujoco
    

    Which can be downloaded from the website

      https://www.roboti.us/index.html
    

    ERROR: Failed building wheel for mujoco-py Building wheel for Box2D-kengz (setup.py) ... error ERROR: Command errored out with exit status 1: command: /root/miniconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-d7en4dsl/box2d-kengz_80f868e88aae41c08615b9f69ea9e838/setup.py'"'"'; file='"'"'/tmp/pip-install-d7en4dsl/box2d-kengz_80f868e88aae41c08615b9f69ea9e838/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-qw6ntn10 cwd: /tmp/pip-install-d7en4dsl/box2d-kengz_80f868e88aae41c08615b9f69ea9e838/ Complete output (17 lines): Using setuptools (version 52.0.0.post20210125). running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.7 creating build/lib.linux-x86_64-3.7/Box2D copying library/Box2D/init.py -> build/lib.linux-x86_64-3.7/Box2D copying library/Box2D/Box2D.py -> build/lib.linux-x86_64-3.7/Box2D creating build/lib.linux-x86_64-3.7/Box2D/b2 copying library/Box2D/b2/init.py -> build/lib.linux-x86_64-3.7/Box2D/b2 running build_ext building 'Box2D._Box2D' extension swigging Box2D/Box2D.i to Box2D/Box2D_wrap.cpp swig -python -c++ -IBox2D -small -O -includeall -ignoremissing -w201 -globals b2Globals -outdir library/Box2D -keyword -w511 -D_SWIG_KWARGS -o Box2D/Box2D_wrap.cpp Box2D/Box2D.i unable to execute 'swig': No such file or directory error: command 'swig' failed with exit status 1

    ERROR: Failed building wheel for Box2D-kengz Running setup.py clean for Box2D-kengz Successfully built gym Failed to build mujoco-py Box2D-kengz ERROR: Could not build wheels for mujoco-py which use PEP 517 and cannot be installed directly

    opened by ZhenhuiTang 4
  • Issue SMAC algorithm

    Issue SMAC algorithm

    I am having code issues with SMAC implementation (pull request #137)

    Traceback (most recent call last): File "examples/smac/generate_ant_data.py", line 74, in main() File "examples/smac/generate_ant_data.py", line 70, in main use_gpu=gpu, File "/home/ubuntu/rlkit-master/rlkit/launchers/launcher_util.py", line 605, in run_experiment **run_experiment_kwargs File "/home/ubuntu/rlkit-master/rlkit/launchers/launcher_util.py", line 174, in run_experiment_here return experiment_function(**raw_variant) File "/home/ubuntu/rlkit-master/rlkit/torch/smac/pearl_launcher.py", line 173, in pearl_experiment algorithm.train() File "/home/ubuntu/rlkit-master/rlkit/core/meta_rl_algorithm.py", line 303, in train self.enc_replay_buffer.task_buffers[task_idx].clear() AttributeError: 'SimpleReplayBuffer' object has no attribute 'clear'

    and

    Traceback (most recent call last): File "examples/smac/generate_ant_data.py", line 74, in main() File "examples/smac/generate_ant_data.py", line 70, in main use_gpu=gpu, File "/home/ubuntu/rlkit-master/rlkit/launchers/launcher_util.py", line 605, in run_experiment **run_experiment_kwargs File "/home/ubuntu/rlkit-master/rlkit/launchers/launcher_util.py", line 174, in run_experiment_here return experiment_function(**raw_variant) File "/home/ubuntu/rlkit-master/rlkit/torch/smac/pearl_launcher.py", line 173, in pearl_experiment algorithm.train() File "/home/ubuntu/rlkit-master/rlkit/core/meta_rl_algorithm.py", line 436, in train self.trainer.train(batch) File "/home/ubuntu/rlkit-master/rlkit/torch/torch_rl_algorithm.py", line 40, in train self.train_from_torch(batch) File "/home/ubuntu/rlkit-master/rlkit/torch/smac/pearl.py", line 184, in train_from_torch action_distrib.rsample_logprob_and_pretanh() AttributeError: 'TanhNormal' object has no attribute 'rsample_logprob_and_pretanh'

    opened by ijmarrero 4
  • AWAC doesn't profit from offline data

    AWAC doesn't profit from offline data

    Hi,

    @anair13 , it's nice that we can get the code, seems you answer AWAC questions frequently, so I just directly make "@" to you.

    In AWAC paper the main benifit is that switching from offline-training to online training there is no "dip" of the performance. But when I run it on mujoco-gym environment, it doesn't get benifit from the pre-training on offline dataset.

    • HalfCheetah, it learns nothing , the episode returns are almost always below zero.
    • Ant, it performs nearly expert performance after switching from offline to online, but it have a huge dip to nearly zero.
    • Walker2d, it also has a dip.

    I run the code in repo examples/awac/mujoco/awac1.py with all default settings, seems pretraining on offline data doesn't help these experiments. I find this link in issues(https://drive.google.com/file/d/1Qy5SYIGNwdeTHAGNjbRfuP5pSiRw8JzJ/view), looks in this file the leraning processs also doesn't profit much from the offline-learning.

    Do I have to change any hyperparameter? If would be really super nice if I can reproduce the paper result.

    Looking forward to your reply.

    Best.

    opened by im-Kitsch 3
  • Skew-fit gaussian_identity_variance

    Skew-fit gaussian_identity_variance

    Hi, thanks for the shared code.

    I assume gaussian_identity_variance distribution of thedecoder distribution is a standard normal distribution.

    However, in the function compute_log_p_log_q_log_d, the variance was taken as logvar instead, i.e., the variance is e.

    May I confirm this though it's a minor issue?

    opened by NoListen 0
  • High Memory & Disk Requirement for SMAC

    High Memory & Disk Requirement for SMAC

    By running python -m examples.smac.generate_ant_data, more than 70 GB is already consumed. What is the requirement for running SMAC? And how to reduce the memory footprint?

    opened by ShayekhBinIslam 1
  • Cannot reproduce the results of IQL on antmaze

    Cannot reproduce the results of IQL on antmaze

    I've run examples/iql/antmaze_finetune.py, but the results are so bad, oscillating between 0 and 1 (as shown in the figure below), which are totally different from the result figures in examples/iql/README.md.

    飞书20220403-174713

    opened by Shenzhi-Wang 1
  • Position Control with mujoco-py

    Position Control with mujoco-py

    Hi everyone! I would like to control a robotic EE in position only, so I wrote in the actuator part of XML code and all the magic stuff needed. The problem I encounter is that during RL the action is sampled in the ctrlrange but I would like to have the whole joint space while keeping a limited action sample. Is that any way to solve this stuff? Thanks!

    THE XML FILE

    <compiler angle="radian"/>
    
    <option cone="elliptic">
        <flag gravity="disable"/>
    </option>
    
    <asset>
        <texture name="texplane" type="2d" builtin="checker" rgb1=".2 .3 .4" rgb2=".1 0.15 0.2"
                width="512" height="512"/>
        <material name="MatGnd" reflectance="0.5" texture="texplane" texrepeat="1 1" texuniform="true"/>
    
        <mesh name="link0_collision" file="stl/panda/collision/link0.stl"/>
        <mesh name="link1_collision" file="stl/panda/collision/link1.stl"/>
        <mesh name="link2_collision" file="stl/panda/collision/link2.stl"/>
        <mesh name="link3_collision" file="stl/panda/collision/link3.stl"/>
        <mesh name="link4_collision" file="stl/panda/collision/link4.stl"/>
        <mesh name="link5_collision" file="stl/panda/collision/link5.stl"/>
        <mesh name="link6_collision" file="stl/panda/collision/link6.stl"/>
        <mesh name="link7_collision" file="stl/panda/collision/link7.stl"/>
        <mesh name="hand_collision" file="stl/panda/collision/hand.stl"/>
        <mesh name="finger_collision" file="stl/panda/collision/finger.stl" scale='1.75 1.0 1.75'/>
        <mesh name="link0_visual" file="stl/panda/visual/link0.stl"/>
        <mesh name="link1_visual" file="stl/panda/visual/link1.stl"/>
        <mesh name="link2_visual" file="stl/panda/visual/link2.stl"/>
        <mesh name="link3_visual" file="stl/panda/visual/link3.stl"/>
        <mesh name="link4_visual" file="stl/panda/visual/link4.stl"/>
        <mesh name="link5_visual" file="stl/panda/visual/link5.stl"/>
        <mesh name="link6_visual" file="stl/panda/visual/link6.stl"/>
        <mesh name="link7_visual" file="stl/panda/visual/link7.stl"/>
        <mesh name="hand_visual" file="stl/panda/visual/hand.stl"/>
        <mesh name="finger_visual" file="stl/panda/collision/finger.stl" scale='1.75 1.0 1.75'/>
    </asset>
    
    <visual>
    	<scale framewidth="0.05" framelength="0.8" jointwidth="0.05" jointlength="0.8" actuatorwidth="0.05" actuatorlength="0.8" forcewidth="0.1" contactwidth="0.1"/>
    </visual>
    
    <default>
        <geom condim="4"/>
        <default class="panda">
            <joint pos="0 0 0" limited="true" damping="100"/>
                <position forcelimited="true" ctrllimited="true" user="1002 40 2001 -0.005 0.005"/>
                <default class="visual">
                <geom contype="0" conaffinity="0" group="0" type="mesh" rgba=".95 .99 .92 1" mass="0"/>
            </default>
    
            <default class="collision">
                <geom contype="1" conaffinity="1" group="3" type="mesh" rgba=".5 .6 .7 1"/>
            </default>
    
            <default class="panda_finger">
                <joint damping="0" armature='5'/>
            </default>
        </default>
    </default>
    
    <worldbody>
        <light pos="0 0 1000" castshadow="false"/>
    
        <!-- FLOOR -->
        <geom name="ground" pos="0 0 0" size="5 5 10" material="MatGnd" type="plane" contype="1" conaffinity="1"/>
    
        <!-- TABLE -->
        <!--geom name="wrk_table" pos="0 0 0.2" type="box" mass="90" size=".2 .2 .2" rgba="0.9 0.9 0.9 1" contype="1" conaffinity="1"/-->
    
    
        <!--  TARGET  -->
    	<body name="target" pos="0 0 .45">
    		<!--geom type="cylider" size="0.02 0.05" rgba=".9 0 0 .5" contype="8" conaffinity="8"/-->
            <site name="target_site" type="cylinder" size="0.02 0.05" pos="0 0 0" rgba="0.9529411765 0.8 0.03529411765 0.5"/>
    	</body>
    
        <!-- HAND -->
        <body name="panda_hand" pos="0 0 0.8" euler="3.14159265359 0 0">
    
            <joint name="panda_x" pos="0 0 0" type="slide" axis="1 0 0" frictionloss="0" damping="1000"/>
            <joint name="panda_y" pos="0 0 0" type="slide" axis="0 1 0" frictionloss="0" damping="1000"/>
            <joint name="panda_z" pos="0 0 0" type="slide" axis="0 0 1" frictionloss="0" damping="1000"/>
            <joint name="panda_ball" pos="0 0 0" type="ball" frictionloss="0" damping="10"/>
    
            <site name="ee_site" pos="0 0 0" size="0.005, 0.005, 0.005" euler="0 0 -1.57"/>
            <inertial pos="0 0 0" euler="0 0 0" mass="1" diaginertia="0.1 0.1 0.1"/>
            <geom class="visual" mesh="hand_visual"/>
            <geom class="collision" mesh="hand_collision"/>
    
            <body name="panda_left_finger" pos="0 0.02 0.0584" quat="1 0 0 0">
                <!--joint name="panda_finger_joint1" axis="0 1 0" type="slide" range="-0.02 0.02" damping="1000" armature="500"/-->
                <geom class="visual" mesh="finger_visual"/>
                <geom class="collision" mesh="finger_collision" mass="1"/>
            </body>
    
            <body name="panda_right_finger" pos="0 -0.02 0.0584" quat="1 0 0 0">
                <!--joint name="panda_finger_joint2" axis="0 -1 0" type="slide" range="-0.02 0.02" damping="1000" armature="500"/-->
                <geom quat="0 0 0 1" class="visual" mesh="finger_visual"/>
                <geom quat="0 0 0 1" class="collision" mesh="finger_collision" mass="1"/>
            </body>
    
            <!-- COMPONENT -->
            <body name="component" pos="0 0 0.18">
                <geom type="cylinder" size="0.02 0.05" mass="0.1" margin="0.001"/>
                <site name="component_site" pos="0 0 0.05" rgba="0.9529411765 0.8 0.03529411765 1"/>
    
            </body>
    
        </body>
    
    </worldbody>
    
    <actuator>
    
        <position name="panda_x" joint="panda_x" class="panda_finger" gear="1" kp="100000" forcerange="-50 50" ctrllimited="true" ctrlrange="-0.01 0.01"/>
        <position name="panda_y" joint="panda_y" class="panda_finger" gear="1" kp="100000" forcerange="-50 50" ctrllimited="true" ctrlrange="-0.01 0.01"/>
        <position name="panda_z" joint="panda_z" class="panda_finger" gear="1" kp="100000" forcerange="-50 50" ctrllimited="true" ctrlrange="-0.01 0.01"/>
        <position name="panda_ball_1" joint="panda_ball" class="panda_finger" kp="10000" gear="1 0 0 0 0 0" forcerange="-50 50" ctrllimited="true" ctrlrange="-0.03 0.03"/>
        <position name="panda_ball_2" joint="panda_ball" class="panda_finger" kp="10000" gear="0 1 0 0 0 0" forcerange="-50 50" ctrllimited="true" ctrlrange="-0.03 0.03"/>
        <position name="panda_ball_3" joint="panda_ball" class="panda_finger" kp="10000" gear="0 0 1 0 0 0" forcerange="-50 50" ctrllimited="true" ctrlrange="-0.03 0.03"/>
    
    </actuator>
    
    <sensor>
        <force name="ee_force_sensor" site="ee_site"/>
        <torque name="ee_torque_sensor" site="ee_site"/>
    </sensor>
    

    THE ENVIRONMENT

        import numpy as np
        from gym import utils
        from gym.envs.mujoco import mujoco_env
        
        
        class PandaEnv(mujoco_env.MujocoEnv, utils.EzPickle):
        
            def __init__(self):
                utils.EzPickle.__init__(self)
                mujoco_env.MujocoEnv.__init__(self, "/home/bara/doc/rlkit/generic/panda.xml", 2)
        
            def step(self, a):
        
                # STEP
                self.do_simulation(a, self.frame_skip)
        
                #DISTANCE
                xpos_component = self.get_body_com("component")
                xpos_target = self.get_body_com("target")
                dist = xpos_component - xpos_target
        
                #REWARD
                reward_dist = -np.linalg.norm(dist)
                reward = reward_dist  # More contributions to rewards may be added
                # print("REWARD: " + str(reward))
                # print("ACTION: " + str(a))
        
                ob = self._get_obs()
                done = False
        
                return ob, reward, done, dict(reward_dist=reward_dist)
        
            def viewer_setup(self):
                self.viewer.cam.trackbodyid = 0
        
            def reset_model(self):
                qpos = np.zeros(self.model.nq)
                qvel = np.zeros(self.model.nv)
                self.set_state(qpos, qvel)
                return self._get_obs()
        
            def _get_obs(self):
                return np.concatenate(
                    [
                        self.get_body_com("component"),
                        self.get_body_com("target")
                    ]
                )
    
    opened by bara-bba 0
Owner
Robotic AI & Learning Lab Berkeley
Robotic AI & Learning Lab Berkeley
PyTorch implementations of deep reinforcement learning algorithms and environments

Deep Reinforcement Learning Algorithms with PyTorch This repository contains PyTorch implementations of deep reinforcement learning algorithms and env

Petros Christodoulou 4.7k Jan 4, 2023
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

PyTorch RL Minimal Implementations There are implementations of some reinforcement learning algorithms, whose characteristics are as follow: Less pack

Gemini Light 4 Dec 31, 2022
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 1, 2023
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 7, 2022
Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

Manav Mishra 4 Apr 15, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 8, 2023
Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

Trading Gym Trading Gym is an open-source project for the development of reinforcement learning algorithms in the context of trading. It is currently

Dimitry Foures 535 Nov 15, 2022
mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms.

mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms. It provides easily interchangeable modeling and planning components, and a set of utility functions that allow writing model-based RL algorithms with only a few lines of code.

Facebook Research 724 Jan 4, 2023
DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

DI-HPC: Decision Intelligence - High Performance Computation DI-HPC is an acceleration operator component for general algorithm modules in reinforceme

OpenDILab 185 Dec 29, 2022
SimDeblur is a simple framework for image and video deblurring, implemented by PyTorch

SimDeblur (Simple Deblurring) is an open source framework for image and video deblurring toolbox based on PyTorch, which contains most deep-learning based state-of-the-art deblurring algorithms. It is easy to implement your own image or video deblurring or other restoration algorithms.

null 220 Jan 7, 2023
Yet Another Robotics and Reinforcement (YARR) learning framework for PyTorch.

Yet Another Robotics and Reinforcement (YARR) learning framework for PyTorch.

Stephen James 51 Dec 27, 2022
YARR is Yet Another Robotics and Reinforcement learning framework for PyTorch.

Yet Another Robotics and Reinforcement (YARR) learning framework for PyTorch.

Stephen James 21 Aug 1, 2021
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Automatic, Readable, Reusable, Extendable Machin is a reinforcement library designed for pytorch. Build status Platform Status Linux Windows Supported

Iffi 348 Dec 24, 2022
ChatBot-Pytorch - A GPT-2 ChatBot implemented using Pytorch and Huggingface-transformers

ChatBot-Pytorch A GPT-2 ChatBot implemented using Pytorch and Huggingface-transf

ParZival 42 Dec 9, 2022
Machine learning framework for both deep learning and traditional algorithms

NeoML is an end-to-end machine learning framework that allows you to build, train, and deploy ML models. This framework is used by ABBYY engineers for

NeoML 704 Dec 27, 2022
Learning to trade under the reinforcement learning framework

Trading Using Q-Learning In this project, I will present an adaptive learning model to trade a single stock under the reinforcement learning framework

Uirá Caiado 470 Nov 28, 2022