ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning
ElegantRL is developed for researchers and practitioners with the following advantages:
-
Lightweight: the core codes <1,000 lines (check elegantrl/tutorial), using PyTorch (train), OpenAI Gym (env), NumPy, Matplotlib (plot).
-
Efficient: in many testing cases, we find it more efficient than Ray RLlib.
-
Stable: much more stable than [Stable Baselines 3] (https://github.com/DLR-RM/stable-baselines3). Stable Baselines 3 can only use single GPU, but ElegantRL can use 1~8 GPUs for stable training.
ElegantRL implements the following model-free deep reinforcement learning (DRL) algorithms:
- DDPG, TD3, SAC, PPO, PPO (GAE),REDQ for continuous actions
- DQN, DoubleDQN, D3QN, SAC for discrete actions
- QMIX, VDN; MADDPG, MAPPO, MATD3 for multi-agent environment
For the details of DRL algorithms, please check out the educational webpage OpenAI Spinning Up.
《诗经·小雅·鹤鸣》中「他山之石,可以攻玉」,是我们的库“小雅”名字的来源。
Contents
News
- [Towardsdatascience] ElegantRL-Podracer: A Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning
- [Towardsdatascience] ElegantRL: A Lightweight and Stable Deep Reinforcement Learning Library
- [Towardsdatascience] ElegantRL: Mastering PPO Algorithms
- [MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part I)
- [MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part II)
Helloworld folder)
Framework (An agent (agent.py) with Actor-Critic networks (net.py) is trained (run.py) by interacting with an environment (env.py).
A high-level overview:
- 1). Instantiate an environment in Env.py, and an agent in Agent.py with an Actor network and a Critic network in Net.py;
- 2). In each training step in Run.py, the agent interacts with the environment, generating transitions that are stored into a Replay Buffer;
- 3). The agent fetches a batch of transitions from the Replay Buffer to train its networks;
- 4). After each update, an evaluator evaluates the agent's performance (e.g., fitness score or cumulative return) and saves the agent if the performance is good.
Code Structure
Core Codes
- elegantrl/agents/net.py # Neural networks.
- Q-Net,
- Actor network,
- Critic network,
- elegantrl/agents/Agent___.py # RL algorithms.
- AgentBase,
- elegantrl/train/run___.py # run DEMO 1 ~ 4
- Parameter initialization,
- Training loop,
- Evaluator.
Until Codes
- elegantrl/envs/ # gym env or custom env, including FinanceStockEnv.
- gym_utils.py: A PreprocessEnv class for gym-environment modification.
- Stock_Trading_Env: A self-created stock trading environment as an example for user customization.
- eRL_demo_BipedalWalker.ipynb # BipedalWalker-v2 in jupyter notebooks
- eRL_demos.ipynb # Demo 1~4 in jupyter notebooks. Tell you how to use tutorial version and advanced version.
- eRL_demo_SingleFilePPO.py # Use a single file to train PPO, more simple than tutorial version
- eRL_demo_StockTrading.py # Stock Trading Application in jupyter notebooks
Start to Train
Initialization:
- hyper-parameters
args
. env = PreprocessEnv()
: creates an environment (in the OpenAI gym format).agent = agent.XXX()
: creates an agent for a DRL algorithm.buffer = ReplayBuffer()
: stores the transitions.evaluator = Evaluator()
: evaluates and stores the trained model.
Training (a while-loop):
agent.explore_env(…)
: the agent explores the environment within target steps, generates transitions, and stores them into the ReplayBuffer.agent.update_net(…)
: the agent uses a batch from the ReplayBuffer to update the network parameters.evaluator.evaluate_save(…)
: evaluates the agent's performance and keeps the trained model with the highest score.
The while-loop will terminate when the conditions are met, e.g., achieving a target score, maximum steps, or manually breaks.
Experiments
Experimental Demos
Note: BipedalWalkerHardcore is a difficult task in continuous action space. There are only a few RL implementations can reach the target reward. Check out an experiment video: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.
Requirements
Necessary:
| Python 3.6+ |
| PyTorch 1.6+ |
Not necessary:
| Numpy 1.18+ | For ReplayBuffer. Numpy will be installed along with PyTorch.
| gym 0.17.0 | For env. Gym provides tutorial env for DRL training. (env.render() bug in gym==0.18 pyglet==1.6. Change to gym==0.17.0, pyglet==1.5)
| pybullet 2.7+ | For env. We use PyBullet (free) as an alternative of MuJoCo (not free).
| box2d-py 2.3.8 | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2 | For plots.
pip3 install gym==0.17.0 pybullet Box2D matplotlib
To install StarCraftII env,
bash ./elegantrl/envs/installsc2.sh
pip install -r sc2_requirements.txt
Citation:
To cite this repository:
@misc{erl,
author = {Liu, Xiao-Yang and Li, Zechu and Wang, Zhaoran and Zheng, Jiahao},
title = {{ElegantRL}: A Scalable and Elastic Deep Reinforcement Learning Library},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/AI4Finance-Foundation/ElegantRL}},
}