Safe Reinforcement Learning Using Probabilistic Shields

Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, Roderick Bloem

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung


This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.
Titel31st International Conference on Concurrency Theory, CONCUR 2020
Untertitel31st CONCUR 2020: Vienna, Austria (Virtual Conference)
Redakteure/-innenIgor Konnov, Laura Kovacs
Herausgeber (Verlag)Schloss Dagstuhl - Leibniz-Zentrum für Informatik
ISBN (elektronisch)978-3-95977-160-3
PublikationsstatusVeröffentlicht - 2020
Veranstaltung31st International Conference on Concurrency Theory - Virtuell, Österreich
Dauer: 1 Sep. 20204 Sep. 2020


Konferenz31st International Conference on Concurrency Theory
KurztitelCONCUR 2020

ASJC Scopus subject areas

  • Software


Untersuchen Sie die Forschungsthemen von „Safe Reinforcement Learning Using Probabilistic Shields“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren