Getting Started
This repository contains the code used for the following publications:
- Probabilistic Guarantees for Safe Deep Reinforcement Learning (FORMATS 2020)
- Verifying Reinforcement Learning up to Infinity (IJCAI 2021)
- Verified Probabilistic Policies for Deep Reinforcement Learning (NFM 2022)
These instructions will help with setting up the project
Prerequisites
Create a virtual environment with conda:
conda env create -f environment.yml
conda activate safedrl
This will take care of installing all the dependencies needed by python
In addition, download PRISM from the following link: https://github.com/phate09/prism
Ensure you have Gradle installed (https://gradle.org/install/)
Running the code
Before running any code, in a new terminal go to the PRISM project folder and run
gradle run
This will enable the communication channel between PRISM and the rest of the repository
Probabilistic Guarantees for Safe Deep Reinforcement Learning (FORMATS 2020)
Training
Run the train_pendulum.py
inside agents/dqn
to train the agent on the inverted pendulum problem and record the location of the saved agent
Analysis
Run the domain_analysis_sym.py
inside runnables/symbolic/dqn
changing paths to point to the saved network
Verifying Reinforcement Learning up to Infinity (IJCAI 2021)
####Paper results ## download and unzip experiment_collection_final.zip in the 'save' directory
run tensorboard --logdir=./save/experiment_collection_final
(results for the output range analysis experiments are in experiment_collection_ora_final.zip)
####Train neural networks from scratch ## run either:
training/tune_train_PPO_bouncing_ball.py
training/tune_train_PPO_car.py
training/tune_train_PPO_cartpole.py
####Check safety of pretrained agents ## download and unzip pretrained_agents.zip in the 'save' directory
run verification/run_tune_experiments.py
(to monitor the progress of the algorithm run tensorboard --logdir=./save/experiment_collection_final
)
The results in tensorboard can be filtered using regular expressions (eg. "bouncing_ball.* template: 0") on the search bar on the left:
The name of the experiment contains the name of the problem (bouncing_ball, cartpole, stopping car), the amount of adversarial noise ("eps", only for stopping_car), the time steps length for the dynamics of the system ("tau", only for cartpole) and the choice of restriction in order of complexity (0 being box, 1 being the chosen template, and 2 being octagon).
The table in the paper is filled by using some of the metrics reported in tensorboard:
- max_t: Avg timesteps
- seen: Avg polyhedra
- time_since_restore: Avg clock time (s)