Deep reinforcement learning library built on top of Neural Network Libraries

Sony

Last update: Dec 14, 2022

Related tags

Third-party APIs Wrappers nnabla-rl

Overview

Deep Reinforcement Learning Library built on top of Neural Network Libraries

NNablaRL is a deep reinforcement learning library built on top of Neural Network Libraries that is intended to be used for research, development and production.

Installation

Installing NNablaRL is easy!

$ pip install nnabla-rl

NNablaRL only supports Python version >= 3.6 and NNabla version >= 1.17.

Enabling GPU accelaration (Optional)

NNablaRL algorithms run on CPU by default. To run the algorithm on GPU, first install nnabla-ext-cuda as follows. (Replace [cuda-version] depending on the CUDA version installed on your machine.)

$ pip install nnabla-ext-cuda[cuda-version]

# Example installation. Supposing CUDA 11.0 is installed on your machine.
$ pip install nnabla-ext-cuda110

After installing nnabla-ext-cuda, set the gpu id to run the algorithm on through algorithm's configuration.

import nnabla_rl.algorithms as A

config = A.DQNConfig(gpu_id=0) # Use gpu 0. If negative, will run on CPU.
dqn = A.DQN(env, config=config)
...

Features

Friendly API

NNablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4") # 1
dqn = A.DQN(env)  # 2
dqn.train(env)  # 3

To get more details about NNablaRL, see documentation and examples.

Many builtin algorithms

Most of famous/SOTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., are implemented in NNablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations.

For the list of implemented algorithms see here.

You can also find the reproduction and evaluation results of each algorithm here.
Note that you may not get completely the same results when running the reproduction code on your computer. The result may slightly change depending on your machine, nnabla/nnabla-rl's package version, etc.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With NNablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl
import nnabla_rl.algorithms as A

simulator = get_simulator() # This is just an example. Assuming that simulator exists
dqn = A.DQN(simulator)
# train online for 1M iterations
dqn.train_online(simulator, total_iterations=1000000)

real_data = get_real_robot_data() # This is also an example. Assuming that you have real robot data
# fine tune the agent offline for 10k iterations using real data
dqn.train_offline(real_data, total_iterations=10000)

Getting started

Try below interactive demos to get started.
You can run it directly on Colab from the links in the table below.

Title	Notebook	Target RL task
Simple reinforcement learning training to get started		Pendulum
Learn how to use training algorithms		Pendulum
Learn how to use customized network model for training		Mountain car
Learn how to use different network solver for training		Pendulum
Learn how to use different replay buffer for training		Pendulum
Learn how to use your own environment for training		Customized environment
Atari game training example		Atari games

Documentation

Full documentation is here.

Contribution guide

Any kind of contribution to NNablaRL is welcome! See the contribution guide for details.

License

NNablaRL is provided under the Apache License Version 2.0 license.

Comments

Update cem function interface

Updated interface of cross entropy function methods. The args, pop_size is now changed to sample_size. In addition, the given objective function to CEM function will be called with variable x which has (batch_size, sample_size, x_dim). This is different from previous interface. If you want to know the details, please see the function docs.

opened by sbsekiguchi 1
Add implementation for RNN support and DRQN algorithm
Add RNN model support and DRQN algorithm.

Following trainers will support RNN-model.

Q value-based trainers

Deterministic gradient and Soft policy trainers

Other trainers can support RNN models in future but is not implemented in the initial release.

See this paper for the details of the DRQN algorithm.
opened by ishihara-y 1

Implement SACD

This PR implements SAC-D algorithm. https://arxiv.org/abs/2206.13901

These changes have been made:

New environments with factored reward functions have been added
- FactoredLunarLanderContinuousV2NNablaRL-v1
- FactoredAntV4NNablaRL-v1
- FactoredHopperV4NNablaRL-v1
- FactoredHalfCheetahV4NNablaRL-v1
- FactoredWalker2dV4NNablaRL-v1
- FactoredHumanoidV4NNablaRL-v1
SACD algorithms has been added
SoftQDTrainer has been added
_InfluenceMetricsEvaluator has been added
reproduction script has been added (not benchmarked yet)

visualizing influence metrics

import gym

import numpy as np
import matplotlib.pyplot as plt

import nnabla_rl.algorithms as A
import nnabla_rl.hooks as H
import nnabla_rl.writers as W
from nnabla_rl.utils.evaluator import EpisodicEvaluator

env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
eval_env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")

evaluation_hook = H.EvaluationHook(
    eval_env,
    EpisodicEvaluator(run_per_evaluation=10),
    timing=5000,
    writer=W.FileWriter(outdir="logdir", file_prefix='evaluation_result'),
)
iteration_num_hook = H.IterationNumHook(timing=100)

config = A.SACDConfig(gpu_id=0, reward_dimension=9)
sacd = A.SACD(env, config=config)
sacd.set_hooks([iteration_num_hook, evaluation_hook])
sacd.train_online(env, total_iterations=100000)

influence_history = []

state = env.reset()
while True:
    action = sacd.compute_eval_action(state)
    influence = sacd.compute_influence_metrics(state, action)
    influence_history.append(influence)
    state, _, done, _ = env.step(action)
    if done:
        break

influence_history = np.array(influence_history)
for i, label in enumerate(["position", "velocity", "angle", "left_leg", "right_leg", "main_eingine", "side_engine", "failure", "success"]):
    plt.plot(influence_history[:, i], label=label)
plt.xlabel("step")
plt.ylabel("influence metrics")
plt.legend()
plt.show()

sample animation

sample

opened by ishihara-y 0

Add gmm and Update gaussian

Added gmm and gaussian of the numpy models. In addition, updated the gaussian distribution's API.

The API change is like following:

batch_size = 10
output_dim = 10
input_shape = (batch_size, output_dim)
mean = np.zeros(shape=input_shape)
sigma = np.ones(shape=input_shape) * 5.
ln_var = np.log(sigma) * 2.
distribution = D.Gaussian(mean, ln_var)
# return nn.Variable
assert isinstance(distribution.sample(), nn.Variable)

Updated:

batch_size = 10
output_dim = 10
input_shape = (batch_size, output_dim)
mean = np.zeros(shape=input_shape)
sigma = np.ones(shape=input_shape) * 5.
ln_var = np.log(sigma) * 2.
# You have to pass the nn.Variable if you want to get nn.Variable as all class method's return.
distribution = D.Gaussian(nn.Variable.from_numpy_array(mean), nn.Variable.from_numpy_array(ln_var))
assert isinstance(distribution.sample(), nn.Variable)

# If you pass np.ndarray, then all class methods return np.ndarray
# Currently, only support without batch shape (i.e. mean.shape = (dims,), ln_var.shape = (dims, dims)).
distribution = D.Gaussian(mean[0], np.diag(ln_var[0]))  # without batch
assert isinstance(distribution.sample(), np.ndarray)

opened by sbsekiguchi 0

Support nnabla-browser

[x] add MonitorWriter
[x] save computational graph as nntxt

example

import gym

import nnabla_rl.algorithms as A
import nnabla_rl.hooks as H
import nnabla_rl.writers as W
from nnabla_rl.utils.evaluator import EpisodicEvaluator

# save training computational graph
training_graph_hook = H.TrainingGraphHook(outdir="test")

# evaluation hook with nnabla's Monitor
eval_env = gym.make("Pendulum-v0")
evaluator = EpisodicEvaluator(run_per_evaluation=10)
evaluation_hook = H.EvaluationHook(
    eval_env,
    evaluator,
    timing=10,
    writer=W.MonitorWriter(outdir="test", file_prefix='evaluation_result'),
)

env = gym.make("Pendulum-v0")
sac = A.SAC(env)
sac.set_hooks([training_graph_hook, evaluation_hook])

sac.train_online(env, total_iterations=100)

opened by ishihara-y 0

Add iLQR and LQR

Implementation of Linear Quadratic Regulator (LQR) and iterative LQR algorithms.

Co-authored-by: Yu Ishihara [email protected] Co-authored-by: Shunichi Sekiguchi [email protected]

opened by ishihara-y 0
Check np_random instance and use correct randint alternative
I am not sure when this change was made but in some environment, gym.unwrapped.np_random returns Generator instead of RandomState.

# in case of RandomState # this line works gym.unwrapped.np_random.rand_int(...) # in case of Generator # rand_int does not exist and we must use integers as an alternative gym.unwrapped.np_random.integers(...)

This PR will fix this issue and chooses correct function for sampling integers.
opened by ishihara-y 0
Add icra2018 qtopt

Add QtOpt algorithm proposed by Deirdre Quillen et al. in the paper Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods.

opened by sbsekiguchi 0

Releases(v0.12.0)

v0.12.0(Oct 7, 2022)
special notes

This version does NOT support the version v0.26.0 and greater of openai gym.

We're going to support openai gym version v0.26.0 and greater in the next release of nnablaRL. nnablaRL will stop officially supporting version less than v0.26.0 of openai gym from the next release.

Only support python 3.7 or greater

Python 3.6 is not supported from this new release

release-note-bugfix

Fix algos. Properly apply grad clip and weight decay

Correct variable to use during rnn training

Check np_random instance and use correct randint alternative

Fix pendulum-env render

Fix ScreenRenderEnv to support gym 0.25.0

release-note-algorithm

Run PPO on single process when actor num is 1

Add qrsac algorithm

Add REDQ algorithm

Update to support discrete tuple

Add icra2018 qtopt

Add goal_env module

Add PPO tuple state support

Add iLQR and LQR

Add mppi

Add ddp

release-note-distributions

Add gmm and Update gaussian

release-note-utility

Support nnabla-browser

release-note-docs

Fix module path of sac

Improve README with graph visulization feature with nnabla-browser

release-note-build

Extend github build timelimit to 5 minutes

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.11.0(Mar 17, 2022)
release-note-bugfix

Fix readme of reproduction

Fix cem test

Fix README samples and add prerequisites for Atari reproduction codes

Fix tutorial-model

Fix add workaround to avoid gym error

release-note-algorithm

Add ATRPO

Add implementation for RNN support and DRQN algorithm, Support RNN models on DQN and DQN inherited algorithms, Follow DRQN author's implementation and update results

Expand RNN support to dist rl algorithms

Add rnn support to actor critic algorithms

Support n-step q learning in ddpg, td3, her, sac and ICML2018SAC

Stop back propagating to target v function

Add MME-SAC algorithm and Sparse/Delayed mujoco environment and Add Disentangled version of MME-SAC

release-note-functions

Add stop gradient function

Add random shooting

Update cem function interface

release-note-distributions

Add Bernoulli distribution

Enable sampling from multidimensional logits

Add one hot softmax

release-note-utility

Support batched states for evaluation

Add convenient episode result env

Add profile function

release-note-docs

Update version in algorithm catalog

Add readthedocs yaml and Fixed yaml file

Add HER and IQN to algorithm catalog

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.10.0(Oct 20, 2021)
release-note-bugfix

Fix interactive-demos used in colab and Fix interactive-demos used in colab about gpu id

release-note-algorithm

Add HER

Add Rainbow

Fix algorithm reproduction directory path

Add rank-based prioritized replay

Add Double Dqn

Move algorithms reproduction dir to reproductions/algorithms

Enable injecting explorer to algorithm

Support multi-step Q learning

Add Categorical Double Dqn

Add c51 all atari game results

Support Tuple State and Update compute_v_target_and_advantage to support tuple state

release-note-parametric_functions

Add spatial_softmax function and Add spatial softmax docs

Add noisy net

release-note-functions

Add batch_flatten function

Add triangular_matrix function

release-note-utility

Fix load_snapshot

release-note-docs

Fix docs typo

Fix typo in readme

Display correct version

Fix numpy array typing to np.ndarray

Add function docs

Fix docstring of algorithms

Update NNablaRL to nnablaRL

Fix typo seemless -> seamless

Fix build badge URL

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.9.0(Jun 14, 2021)
We are happy to announce the release of nnablaRL, a deep reinforcement learning (RL) library built on top of nnabla. Reinforcement learning is one of the cutting edge machine learning technology that achieves super human performance in the field of gaming, robotics, etc.. We hope that this new library, nnablaRL, helps RL experts and also non-RL experts using reinforcement learning algorithms easily among our nnabla ecosystem.

Features of nnablaRL is the following.

Friendly API

nnablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl import nnabla_rl.algorithms as A from nnabla_rl.utils.reproductions import build_atari_env env = build_atari_env("BreakoutNoFrameskip-v4") # 1 dqn = A.DQN(env) # 2 dqn.train(env) # 3

You can also customize the algorithm's hyper parameters easily. For example, you can change the batch size of training data as follows.

import nnabla_rl import nnabla_rl.algorithms as A from nnabla_rl.utils.reproductions import build_atari_env env = build_atari_env("BreakoutNoFrameskip-v4") config = A.DQNConfig(batch_size=100) dqn = A.DQN(env, config=config) dqn.train(env)

In addition to algorithm hyper parameters, you can also flexibly change the training component such as neural network models and model solvers. For details, see sample codes and API documents.

Many builtin algorithms

Most of famous/SoTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., is already implemented in nnablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations. Please check the sample codes and document for detail usage of each algorithm. You can find the list of implemented algorithms here.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With nnablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl import nnabla_rl.algorithms as A simulator = get_simulator() # This is just an example. Assuming that simulator exists dqn = A.DQN(simulator, config=config) dqn.train_online(simulator) real_data = get_real_data() # This is also an example. Assuming that you have real robot data dqn.train_offline(real_data)

Getting started

You can find both notebook style interactive demos and raw python scripts as a sample code to get started. If you are unfamiliar with reinforcement learning, we recommend trying the notebook as a starting point. You can immediately launch and start training through google colaboratory! Check the list of notebooks here.

Development of nnablaRL has just started. We will continue adding new reinforcement learning algorithms and SoTA techniques to nnablaRL. Feedbacks, feature requests and contributions are welcome! Check the contribution guide for details.
Source code(tar.gz)
Source code(zip)

Owner

Sony

Sony Group Corporation

GitHub

Get charts, top artists and top songs WITHOUT LastFM API

LastFM Get charts, top artists and top songs WITHOUT LastFM API Usage Get stats (charts) We provide many filters and options to customize. Geo filter

4 Feb 11, 2022

Free and Open Source Machine Translation API. 100% self-hosted, no limits, no ties to proprietary services. Built on top of Argos Translate.

LibreTranslate Try it online! | API Docs Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it doesn't rely on pro

3.5k Jan 3, 2023

A discord http interactions framework built on top of Sanic

snowfin An async discord http interactions framework built on top of Sanic Installing for now just install the package through pip via github # Unix b

13 Dec 15, 2022

A decentralized messaging daemon built on top of the Kademlia routing protocol.

parakeet-message A decentralized messaging daemon built on top of the Kademlia routing protocol. Now that you are done laughing... pictures what is it

3 Apr 23, 2022

Minimal Python client for the Iris API, built on top of Authlib and httpx.

??️ Iris Python Client Minimal Python client for the Iris API, built on top of Authlib and httpx. Installation pip install dioptra-iris-client Usage f

1 Jan 28, 2022

🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

Best-of Machine Learning with Python ?? A ranked list of awesome machine learning Python libraries. Updated weekly. This curated list contains 840 awe

12.2k Jan 4, 2023

A Chip-8 emulator written using Python's default libraries

Chippure A Chip-8 emulator written using Python's default libraries. Instructions: Simply launch the .py file and type the name of the Chip8 ROM you w

5 Sep 27, 2022

Gnosis-py includes a set of libraries to work with Ethereum and Gnosis projects

Gnosis-py Gnosis-py includes a set of libraries to work with Ethereum and Gnosis projects: EthereumClient, a wrapper over Web3.py Web3 client includin

93 Dec 23, 2022

Python wrappers for INHECO ODTC and SCILA libraries by INHECO GmbH.

1 Feb 9, 2022

Policy and data administration, distribution, and real-time updates on top of Open Policy Agent

⚡ OPAL ⚡ Open Policy Administration Layer OPAL is an administration layer for Open Policy Agent (OPA), detecting changes to both policy and policy dat

8 Dec 7, 2022

A small bot to interact with the reddit API. Get top viewers and update the sidebar widget.

LiveStream_Reddit_Bot Get top twitch and facebook stream viewers for a game and update the sidebar widget and old reddit sidebar to show your communit

1 Nov 21, 2021

A telegram bot that sends a meme a day, from reddit's top meme of the day

MemeBot A telegram bot that sends a meme a day, from reddit's top meme of the day You can use the bot either with an external scheduler (ex: pythonany

1 Dec 13, 2021

An advanced automatic top.gg dank memer voter that votes automatically for you.

Auto Dank Memer Voter An automatic dank memer voter that sends votes onto top.gg every 12 hours, unless their is captcha. I am working on a captcha de

6 Aug 27, 2022

Using Streamlit to build a simple UI on top of the OpenSea API

OpenSea API Explorer Using Streamlit to build a simple UI on top of the OpenSea API. ?? Contributing Contributions, issues and feature requests are we

1 Jan 4, 2022

SOLSEA-NFT-EXPLORE - Using Streamlit to build a simple UI on top of the Solana API

SOLSEA NFT Explorer Using Streamlit to build a simple UI on top of the Solana AP

3 Mar 19, 2022

A python library built on the API of the coderHub.sa, which helps you to fetch the challenges and more

coderHub A python library built on the API of the coderHub.sa, which helps you to fetch the challenges and more Installation • Features • Usage • Lice

5 Nov 4, 2022

First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products (Google Ads, Campaign Manager, Google Analytics).

Megalista Sample integration code for onboarding offline/CRM data from BigQuery as custom audiences or offline conversions in Google Ads, Google Analy

76 Dec 29, 2022

PESU Academy Discord Bot built for PESsants and PESts of PES University

PESU Academy Bot PESU Academy Discord Bot built for PESsants and PESts of PES University You can add the bot to your Discord Server using this link. O

0 Nov 16, 2021

Singer Tap for dbt Artifacts built with the Meltano SDK

tap-dbt-artifacts tap-dbt-artifacts is a Singer tap for dbtArtifacts. Built with the Meltano SDK for Singer Taps.

9 Nov 25, 2022

Deep reinforcement learning library built on top of Neural Network Libraries

Related tags

Overview

Deep Reinforcement Learning Library built on top of Neural Network Libraries

Installation

Enabling GPU accelaration (Optional)

Features

Friendly API

Many builtin algorithms

Seemless switching of online and offline training

Getting started

Documentation

Contribution guide

License

Comments

visualizing influence metrics

sample animation

Releases(v0.12.0)

v0.12.0(Oct 7, 2022)

v0.11.0(Mar 17, 2022)

v0.10.0(Oct 20, 2021)

v0.9.0(Jun 14, 2021)

Getting started

Owner

Sony

Get charts, top artists and top songs WITHOUT LastFM API

Free and Open Source Machine Translation API. 100% self-hosted, no limits, no ties to proprietary services. Built on top of Argos Translate.

A discord http interactions framework built on top of Sanic

A decentralized messaging daemon built on top of the Kademlia routing protocol.

Minimal Python client for the Iris API, built on top of Authlib and httpx.

🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

A Chip-8 emulator written using Python's default libraries

Gnosis-py includes a set of libraries to work with Ethereum and Gnosis projects

Python wrappers for INHECO ODTC and SCILA libraries by INHECO GmbH.

Policy and data administration, distribution, and real-time updates on top of Open Policy Agent

A small bot to interact with the reddit API. Get top viewers and update the sidebar widget.

A telegram bot that sends a meme a day, from reddit's top meme of the day

An advanced automatic top.gg dank memer voter that votes automatically for you.

Using Streamlit to build a simple UI on top of the OpenSea API

SOLSEA-NFT-EXPLORE - Using Streamlit to build a simple UI on top of the Solana API

A python library built on the API of the coderHub.sa, which helps you to fetch the challenges and more

First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products (Google Ads, Campaign Manager, Google Analytics).

PESU Academy Discord Bot built for PESsants and PESts of PES University

Singer Tap for dbt Artifacts built with the Meltano SDK