Safe-Reinforcement-Learning-Baseline
The repository is for Safe Reinforcement Learning (RL) research, in which we investigate various safe RL baselines and safe RL benchmarks, including single agent RL and multi-agent RL. If any authors do not want their paper to be listed here, please feel free to contact <gshangd[AT]foxmail.com>. (This repository is under actively development. We appreciate any constructive comments and suggestions)
You are more than welcome to update this list! If you find a paper about Safe RL which is not listed here, please
- fork this repository, add it and merge back;
- or report an issue here;
- or email <gshangd[AT]foxmail.com>.
The README is organized as follows:
1. Environments Supported
1.1. Safe Single Agent RL benchmarks
1.2. Safe Multi-Agent RL benchmarks
2. Safe RL Baselines
2.1. Safe Single Agent RL Baselines
- Consideration of risk in reinforcement learning, Paper, Not Find Code, (Accepted by ICML 1994)
- Multi-criteria Reinforcement Learning, Paper, Not Find Code, (Accepted by ICML 1998)
- Lyapunov design for safe reinforcement learning, Paper, Not Find Code, (Accepted by ICML 2002)
- Risk-sensitive reinforcement learning, Paper, Not Find Code, (Accepted by Machine Learning, 2002)
- Risk-Sensitive Reinforcement Learning Applied to Control under Constraints, Paper, Not Find Code, (Accepted by Journal of Artificial Intelligence Research, 2005)
- An actor-critic algorithm for constrained markov decision processes, Paper, Not Find Code, (Accepted by Systems & Control Letters, 2005)
- Reinforcement learning for MDPs with constraints, Paper, Not Find Code, (Accepted by European Conference on Machine Learning 2006)
- Discounted Markov decision processes with utility constraints, Paper, Not Find Code, (Accepted by Computers & Mathematics with Applications, 2006)
- Constrained reinforcement learning from intrinsic and extrinsic rewards, Paper, Not Find Code, (Accepted by International Conference on Development and Learning 2007)
- Safe exploration for reinforcement learning, Paper, Not Find Code, (Accepted by ESANN 2008)
- Percentile optimization for Markov decision processes with parameter uncertainty, Paper, Not Find Code, (Accepted by Operations research, 2010)
- Probabilistic goal Markov decision processes, Paper, Not Find Code, (Accepted by AAAI 2011)
- Safe reinforcement learning in high-risk tasks through policy improvement, Paper, Not Find Code, (Accepted by IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) 2011)
- Safe Exploration in Markov Decision Processes, Paper, Not Find Code, (Accepted by ICML 2012)
- Policy gradients with variance related risk criteria, Paper, Not Find Code, (Accepted by ICML 2012)
- Risk aversion in Markov decision processes via near optimal Chernoff bounds, Paper, Not Find Code, (Accepted by NeurIPS 2012)
- Safe Exploration of State and Action Spaces in Reinforcement Learning, Paper, Not Find Code, (Accepted by Journal of Artificial Intelligence Research, 2012)
- An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes, Paper, Not Find Code, (Accepted by Journal of Optimization Theory and Applications, 2012)
- Safe policy iteration, Paper, Not Find Code, (Accepted by ICML 2013)
- Reachability-based safe learning with Gaussian processes, Paper, Not Find Code (Accepted by IEEE CDC 2014)
- Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret, Paper, Not Find Code, (Accepted by ICML 2015)
- High-Confidence Off-Policy Evaluation, Paper, Code (Accepted by AAAI 2015)
- Safe Exploration for Optimization with Gaussian Processes, Paper, Not Find Code (Accepted by ICML 2015)
- Safe Exploration in Finite Markov Decision Processes with Gaussian Processes, Paper, Not Find Code (Accepted by NeurIPS 2016)
- Safe and efficient off-policy reinforcement learning, Paper, Code (Accepted by NeurIPS 2016)
- Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, Paper, Not Find Code (only Arxiv, 2016, citation 530+)
- Safe Learning of Regions of Attraction in Uncertain, Nonlinear Systems with Gaussian Processes, Paper, Code (Accepetd by CDC 2016)
- Safety-constrained reinforcement learning for MDPs, Paper, Not Find Code (Accepted by InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems 2016)
- Convex synthesis of randomized policies for controlled Markov chains with density safety upper bound constraints, Paper, Not Find Code (Accepted by American Control Conference 2016)
- Combating Deep Reinforcement Learning's Sisyphean Curse with Intrinsic Fear, Paper, Not Find Code (only Openreview, 2016)
- Combating reinforcement learning's sisyphean curse with intrinsic fear, Paper, Not Find Code (only Arxiv, 2016)
- Constrained Policy Optimization (CPO), Paper, Code (Accepted by ICML 2017)
- Risk-constrained reinforcement learning with percentile risk criteria, Paper, , Not Find Code (Accepted by The Journal of Machine Learning Research, 2017)
- Probabilistically Safe Policy Transfer, Paper, Not Find Code (Accepted by ICRA 2017)
- Accelerated primal-dual policy optimization for safe reinforcement learning, Paper, Not Find Code (Arxiv, 2017)
- Stagewise safe bayesian optimization with gaussian processes, Paper, Not Find Code (Accepted by ICML 2018)
- Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning, Paper, Code (Accepted by ICLR 2018)
- Safe Model-based Reinforcement Learning with Stability Guarantees, Paper, Code (Accepted by NeurIPS 2018)
- A Lyapunov-based Approach to Safe Reinforcement Learning, Paper, Not Find Code (Accepted by NeurIPS 2018)
- Constrained Cross-Entropy Method for Safe Reinforcement Learning, Paper, Not Find Code (Accepted by NeurIPS 2018)
- Safe Reinforcement Learning via Formal Methods, Paper, Not Find Code (Accepted by AAAI 2018)
- Safe exploration and optimization of constrained mdps using gaussian processes, Paper, Not Find Code (Accepted by AAAI 2018)
- Safe reinforcement learning via shielding, Paper, Code (Accepted by AAAI 2018)
- Trial without Error: Towards Safe Reinforcement Learning via Human Intervention, Paper, Not Find Code (Accepted by AAMAS 2018)
- Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning, Paper, Not Find Code (Accepted by CDC 2018)
- The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems, Paper, Code (Accepted by CoRL 2018)
- OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World, Paper, Not Find Code (Accepted by ICRA 2018)
- Safe reinforcement learning on autonomous vehicles, Paper, Not Find Code (Accepted by IROS 2018)
- Trial without error: Towards safe reinforcement learning via human intervention, Paper, Code (Accepted by AAMAS 2018)
- Safe reinforcement learning: Learning with supervision using a constraint-admissible set, Paper, Not Find Code (Accepted by Annual American Control Conference (ACC) 2018)
- Verification and repair of control policies for safe reinforcement learning, Paper, Not Find Code (Accepted by Applied Intelligence, 2018)
- Safe Exploration in Continuous Action Spaces, Paper, Code, (only Arxiv, 2018, citation 200+)
- Safe exploration of nonlinear dynamical systems: A predictive safety filter for reinforcement learning, Paper, Not Find Code, (Arxiv, 2018, citation 40+)
- Batch policy learning under constraints, Paper, Code, (Accepted by ICML 2019)
- Convergent Policy Optimization for Safe Reinforcement Learning, Paper, Code (Accepted by NeurIPS 2019)
- Constrained reinforcement learning has zero duality gap, Paper, Not Find Code (Accepted by NeurIPS 2019)
- Reinforcement learning with convex constraints, Paper, Code (Accepted by NeurIPS 2019)
- Reward constrained policy optimization, Paper, Not Find Code (Accepted by ICLR 2019)
- Supervised policy update for deep reinforcement learning, Paper, Code, (Accepted by ICLR 2019)
- Lyapunov-based safe policy optimization for continuous control, Paper, Not Find Code (Accepted by ICML Workshop RL4RealLife 2019)
- Safe reinforcement learning with model uncertainty estimates, Paper, Not Find Code (Accepted by ICRA 2019)
- Safe reinforcement learning with scene decomposition for navigating complex urban environments, Paper, Code, (Accepted by IV 2019)
- Verifiably safe off-model reinforcement learning, Paper, Code (Accepted by InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems 2019)
- Probabilistic policy reuse for safe reinforcement learning, Paper, Not Find Code, (Accepted by ACM Transactions on Autonomous and Adaptive Systems (TAAS), 2019)
- Projected stochastic primal-dual method for constrained online learning with kernels, Paper, Not Find Code, (Accepted by IEEE Transactions on Signal Processing, 2019)
- Resource constrained deep reinforcement learning, Paper, Not Find Code, (Accepted by 29th International Conference on Automated Planning and Scheduling 2019)
- Temporal logic guided safe reinforcement learning using control barrier functions, Paper, Not Find Code (Arxiv, Citation 25+, 2019)
- Safe policies for reinforcement learning via primal-dual methods, Paper, Not Find Code (Arxiv, Citation 25+, 2019)
- Value constrained model-free continuous control, Paper, Not Find Code (Arxiv, Citation 35+, 2019)
- Safe Reinforcement Learning in Constrained Markov Decision Processes (SNO-MDP), Paper, Code (Accepted by ICML 2020)
- Responsive Safety in Reinforcement Learning by PID Lagrangian Methods, Paper, Code (Accepted by ICML 2020)
- Constrained markov decision processes via backward value functions, Paper, Code (Accepted by ICML 2020)
- Projection-Based Constrained Policy Optimization (PCPO), Paper, Code (Accepted by ICLR 2020)
- First order constrained optimization in policy space (FOCOPS),Paper, Code (Accepted by NeurIPS 2020)
- Safe reinforcement learning via curriculum induction, Paper, Code (Accepted by NeurIPS 2020)
- Constrained episodic reinforcement learning in concave-convex and knapsack settings, Paper, Code (Accepted by NeurIPS 2020)
- Risk-sensitive reinforcement learning: Near-optimal risk-sample tradeoff in regret, Paper, Not Find Code (Accepted by NeurIPS 2020)
- IPO: Interior-point Policy Optimization under Constraints, Paper, Not Find Code (Accepted by AAAI 2020)
- Safe reinforcement learning using robust mpc, Paper, Not Find Code (IEEE Transactions on Automatic Control, 2020)
- Safe reinforcement learning via projection on a safe set: How to achieve optimality?, Paper, Not Find Code (Accepted by IFAC 2020)
- Learning Transferable Domain Priors for Safe Exploration in Reinforcement Learning, Paper, Code, (Accepted by International Joint Conference on Neural Networks (IJCNN) 2020)
- Learning safe policies with cost-sensitive advantage estimation, Paper, Not Find Code (Openreview 2020)
- Safe reinforcement learning using probabilistic shields, Paper, Not Find Code (2020)
- A constrained reinforcement learning based approach for network slicing, Paper, Not Find Code (Accepted by IEEE 28th International Conference on Network Protocols (ICNP) 2020)
- Exploration-exploitation in constrained mdps, Paper, Not Find Code (Arxiv, 2020)
- Safe reinforcement learning using advantage-based intervention, Paper, Code (Accepted by ICML 2021)
- Shortest-path constrained reinforcement learning for sparse reward tasks, Paper, Code, (Accepted by ICML 2021)
- Density constrained reinforcement learning, Paper, Not Find Code (Accepted by ICML 2021)
- CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee, Paper, Not Find Code (Accepted by ICML 2021)
- Safe Reinforcement Learning by Imagining the Near Future (SMBPO), Paper, Code (Accepted by NeurIPS 2021)
- Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning, Paper, Not Find Code (Accepted by NeurIPS 2021)
- Risk-Sensitive Reinforcement Learning: Symmetry, Asymmetry, and Risk-Sample Tradeoff, Paper, Not Find Code (Accepted by NeurIPS 2021)
- Safe reinforcement learning with natural language constraints, Paper, Code, (Accepted by NeurIPS 2021)
- Conservative safety critics for exploration, Paper, Not Find Code (Accepted by ICLR 2021)
- Risk-averse trust region optimization for reward-volatility reduction, Paper, Not Find Code (Accepted by IJCAI 2021)
- AlwaysSafe: Reinforcement Learning Without Safety Constraint Violations During Training, Paper, Code (Accepted by AAMAS 2021)
- Safe Continuous Control with Constrained Model-Based Policy Optimization (CMBPO), Paper, Code (Accepted by IROS 2021)
- Context-aware safe reinforcement learning for non-stationary environments, Paper, Code (Accepted by ICRA 2021)
- Safe model-based reinforcement learning with robust cross-entropy method, Paper, Code (Accepted by ICLR 2021 Workshop on Security and Safety in Machine Learning Systems)
- Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks, Paper, Code (Accepted by Conference on Learning for Dynamics and Control 2021)
- Can You Trust Your Autonomous Car? Interpretable and Verifiably Safe Reinforcement Learning, Paper, Not Find Code (Accepted by IV 2021)
- Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones, Paper, Code, (Accepted by IEEE RAL, 2021)
- Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee, Paper, Not Find Code (Accepted by Automatica, 2021)
- A predictive safety filter for learning-based control of constrained nonlinear dynamical systems, Paper, Not Find Code (Accepted by Automatica, 2021)
- A simple reward-free approach to constrained reinforcement learning, Paper, Not Find Code (Arxiv, 2021)
- State augmented constrained reinforcement learning: Overcoming the limitations of learning with rewards, Paper, Not Find Code (Arxiv, 2021)
- Safe reinforcement learning using robust action governor, Paper, Not Find Code (Accepted by In Learning for Dynamics and Control, 2022)
- A primal-dual approach to constrained markov decision processes, Paper, Not Find Code (Arxiv, 2022)
- SAUTE RL: Almost Surely Safe Reinforcement Learning Using State Augmentation, Paper, Not Find Code (Arxiv, 2022)
- Finding Safe Zones of policies Markov Decision Processes, Paper, Not Find Code (Arxiv, 2022)
- CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning, Paper, Code (Arxiv, 2022)
- SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition, Paper, Not Find Code (Arxiv, 2022)
- Constrained Variational Policy Optimization for Safe Reinforcement Learning, Paper, Not Find Code (Arxiv, 2022)
2.2. Safe Multi-Agent RL Baselines
- Multi-Agent Constrained Policy Optimisation (MACPO), Paper, Code (Arxiv, 2021)
- MAPPO-Lagrangian, Paper, Code (Arxiv, 2021)
3. Surveys
- A comprehensive survey on safe reinforcement learning, Paper (Accepted by Journal of Machine Learning Research, 2015)
- Safe learning and optimization techniques: Towards a survey of the state of the art, Paper (Accepted by In International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning, 2020)
- Safe learning in robotics: From learning-based control to safe reinforcement learning, Paper (Accepted by Annual Review of Control, Robotics, and Autonomous Systems, 2021)
- Policy learning with constraints in model-free reinforcement learning: A survey, Paper (Accepted by IJCAI 2021)
4. Thesis
- Safe reinforcement learning, Thesis (PhD thesis, Philip S. Thomas, University of Massachusetts Amherst, 2015)
5. Book
- Constrained Markov decision processes: stochastic modeling, Book, (Eitan Altman, Routledge, 1999)