Safe Reinforcement Learning Using Probabilistic Shields

Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, Roderick Bloem

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.
Original languageEnglish
Title of host publication31st International Conference on Concurrency Theory, CONCUR 2020
Subtitle of host publication31st CONCUR 2020: Vienna, Austria (Virtual Conference)
EditorsIgor Konnov, Laura Kovacs
PublisherSchloss Dagstuhl - Leibniz-Zentrum für Informatik
Pages31-316
Number of pages286
ISBN (Electronic)978-3-95977-160-3
DOIs
Publication statusPublished - 2020
Event31st International Conference on Concurrency Theory - Virtuell, Austria
Duration: 1 Sept 20204 Sept 2020

Conference

Conference31st International Conference on Concurrency Theory
Abbreviated titleCONCUR 2020
Country/TerritoryAustria
CityVirtuell
Period1/09/204/09/20

Keywords

  • Formal Verification
  • Markov Decision Process
  • Model Checking
  • Safe Exploration
  • Safe Reinforcement Learning

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Safe Reinforcement Learning Using Probabilistic Shields'. Together they form a unique fingerprint.

Cite this