Rocket-recycling with Reinforcement Learning

Zhengxia Zou

Last update: Jan 3, 2023

Related tags

Deep Learning rocket-recycling

Overview

Rocket-recycling with Reinforcement Learning

I have long been fascinated by the recovery process of SpaceX rockets. In this mini-project, I worked on an interesting question that whether we can address this problem with simple reinforcement learning.

I tried on two tasks: hovering and landing. The rocket is simplified into a rigid body on a 2D plane with a thin rod, considering the basic cylinder dynamics model and air resistance proportional to the velocity.

Their reward functions are quite straightforward.

For the hovering tasks: the step-reward is given based on two factors:
1. the distance between the rocket and the predefined target point - the closer they are, the larger reward will be assigned.
2. the angle of the rocket body (the rocket should stay as upright as possible)
For the landing task: the step-reward is given based on three factors:
1. and 2) are the same as the hovering task
2. Speed and angle at the moment of contact with the ground - when the touching-speed are smaller than a safe threshold and the angle is close to 90 degrees (upright), we see it as a successful landing and a big reward will be assigned.

A thrust-vectoring engine is installed at the bottom of the rocket. This engine provides different thrust values (0, 0.5g, and 1.5g) with three different angles (-15, 0, and +15 degrees).

The action space is defined as a collection of the discrete control signals of the engine. The state-space consists of the rocket position (x, y), speed (vx, vy), angle (a), angle speed (va), and the simulation time steps (t).

I implement the above environment and train a policy-based agent (actor-critic) on solving this problem. The episode reward finally converges very well after over 40000 training episodes.

Despite the simple setting of the environment and the reward, the agent successfully learned the starship classic belly flop maneuver, which makes me quite surprising. The following animation shows a comparison between the real SN10 and a fake one learned from reinforcement learning.

Requirements

See Requirements.txt.

Usage

To train an agent, see ./example_train.py

To test an agent:

import torch
from rocket import Rocket
from policy import ActorCritic
import os
import glob

# Decide which device we want to run on
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

if __name__ == '__main__':

    task = 'hover'  # 'hover' or 'landing'
    max_steps = 800
    ckpt_dir = glob.glob(os.path.join(task+'_ckpt', '*.pt'))[-1]  # last ckpt

    env = Rocket(task=task, max_steps=max_steps)
    net = ActorCritic(input_dim=env.state_dims, output_dim=env.action_dims).to(device)
    if os.path.exists(ckpt_dir):
        checkpoint = torch.load(ckpt_dir)
        net.load_state_dict(checkpoint['model_G_state_dict'])

    state = env.reset()
    for step_id in range(max_steps):
        action, log_prob, value = net.get_action(state)
        state, reward, done, _ = env.step(action)
        env.render(window_name='test')
        if env.already_crash:
            break

License

Rocket-recycling by Zhengxia Zou is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Citation

@misc{zou2021rocket,
  author = {Zhengxia Zou},
  title = {Rocket-recycling with Reinforcement Learning},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/jiupinjia/rocket-recycling}}
}

You might also like...

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

TensorLayer is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides an extens

7.1k Dec 29, 2022

[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

SurRoL IROS 2021 SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning Features dVRK compati

55 Jan 3, 2023

Offline Reinforcement Learning with Implicit Q-Learning

Offline Reinforcement Learning with Implicit Q-Learning This repository contains the official implementation of Offline Reinforcement Learning with Im

125 Dec 31, 2022

Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

Reinforcement Learning with Q Learning Algorithm Q learning algorithm is trained on the gym's frozen lake environment. Libraries Used gym Numpy tqdm P

1 Nov 10, 2021

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

3k Dec 31, 2022

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

5 Aug 30, 2022

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

983 Dec 23, 2022

A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

29.6k Jan 8, 2023

Performant, differentiable reinforcement learning

deluca Performant, differentiable reinforcement learning Notes This is pre-alpha software and is undergoing a number of core changes. Updates to follo

114 Dec 27, 2022

Comments

cannot load the latest training model when restart the training progress
It seems 'glob.glob()' will not fetch the sorted list , this following code cannot load the latest training model . At least it occurred in my laptop.

checkpoint = torch.load(glob.glob(os.path.join(ckpt_folder, '*.pt'))[-1])

I have to sort the list explicitly, so I changed it to this

checkpoint = torch.load(sorted(glob.glob(os.path.join(ckpt_folder, '*.pt')))[-1])

FYI
opened by alalan 1

Rocket-recycling with Reinforcement Learning

Related tags

Overview

Rocket-recycling with Reinforcement Learning

Requirements

Usage

License

Citation

You might also like...

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

Offline Reinforcement Learning with Implicit Q-Learning

Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

A toolkit for developing and comparing reinforcement learning algorithms.

Performant, differentiable reinforcement learning

Comments

cannot load the latest training model when restart the training progress

Owner

Zhengxia Zou

Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

A framework that allows people to write their own Rocket League bots.

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

Learning to trade under the reinforcement learning framework

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

Rocket-recycling with Reinforcement Learning

Related tags

Overview

Rocket-recycling with Reinforcement Learning

Requirements

Usage

License

Citation

You might also like...

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

Offline Reinforcement Learning with Implicit Q-Learning

Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

A toolkit for developing and comparing reinforcement learning algorithms.

Performant, differentiable reinforcement learning

Comments

cannot load the latest training model when restart the training progress

Owner

Zhengxia Zou

Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

A framework that allows people to write their own Rocket League bots.

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

Learning to trade under the reinforcement learning framework

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.