Safe Reinforcement Learning Using Probabilistic Shields

Nils Jansen; Bettina Könighofer; Sebastian Junges; Alex Serban; Roderick Bloem

doi:10.4230/LIPIcs.CONCUR.2020.3

Safe Reinforcement Learning Using Probabilistic Shields

Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, Roderick Bloem

Institute of Applied Information Processing and Communications (7050)

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

Original language	English
Title of host publication	31st International Conference on Concurrency Theory, CONCUR 2020
Subtitle of host publication	31st CONCUR 2020: Vienna, Austria (Virtual Conference)
Editors	Igor Konnov, Laura Kovacs
Publisher	Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Pages	31-316
Number of pages	286
ISBN (Electronic)	978-3-95977-160-3
DOIs	https://doi.org/10.4230/LIPIcs.CONCUR.2020.3
Publication status	Published - 2020
Event	31st International Conference on Concurrency Theory - Virtuell, Austria Duration: 1 Sept 2020 → 4 Sept 2020

Conference

Conference	31st International Conference on Concurrency Theory
Abbreviated title	CONCUR 2020
Country/Territory	Austria
City	Virtuell
Period	1/09/20 → 4/09/20

Keywords

Formal Verification
Markov Decision Process
Model Checking
Safe Exploration
Safe Reinforcement Learning

ASJC Scopus subject areas

Software

Access to Document

10.4230/LIPIcs.CONCUR.2020.3Licence: CC BY 4.0

Cite this

Jansen, N., Könighofer, B., Junges, S., Serban, A., & Bloem, R. (2020). Safe Reinforcement Learning Using Probabilistic Shields. In I. Konnov, & L. Kovacs (Eds.), 31st International Conference on Concurrency Theory, CONCUR 2020: 31st CONCUR 2020: Vienna, Austria (Virtual Conference) (pp. 31-316). Article 3 Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.CONCUR.2020.3

Safe Reinforcement Learning Using Probabilistic Shields. / Jansen, Nils; Könighofer, Bettina; Junges, Sebastian et al.
31st International Conference on Concurrency Theory, CONCUR 2020: 31st CONCUR 2020: Vienna, Austria (Virtual Conference). ed. / Igor Konnov; Laura Kovacs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. p. 31-316 3.

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Jansen, N, Könighofer, B, Junges, S, Serban, A & Bloem, R 2020, Safe Reinforcement Learning Using Probabilistic Shields. in I Konnov & L Kovacs (eds), 31st International Conference on Concurrency Theory, CONCUR 2020: 31st CONCUR 2020: Vienna, Austria (Virtual Conference)., 3, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, pp. 31-316, 31st International Conference on Concurrency Theory, Virtuell, Austria, 1/09/20. https://doi.org/10.4230/LIPIcs.CONCUR.2020.3

@inproceedings{4e68b10aa48c4bdf992d415cebc9d387,

title = "Safe Reinforcement Learning Using Probabilistic Shields",

abstract = "This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.",

keywords = "Formal Verification, Markov Decision Process, Model Checking, Safe Exploration, Safe Reinforcement Learning",

author = "Nils Jansen and Bettina K{\"o}nighofer and Sebastian Junges and Alex Serban and Roderick Bloem",

year = "2020",

doi = "10.4230/LIPIcs.CONCUR.2020.3",

language = "English",

pages = "31--316",

editor = "Igor Konnov and Laura Kovacs",

booktitle = "31st International Conference on Concurrency Theory, CONCUR 2020",

publisher = "Schloss Dagstuhl - Leibniz-Zentrum f{\"u}r Informatik",

address = "Germany",

note = "31st International Conference on Concurrency Theory, CONCUR 2020 ; Conference date: 01-09-2020 Through 04-09-2020",

}

TY - GEN

T1 - Safe Reinforcement Learning Using Probabilistic Shields

AU - Jansen, Nils

AU - Könighofer, Bettina

AU - Junges, Sebastian

AU - Serban, Alex

AU - Bloem, Roderick

PY - 2020

Y1 - 2020

N2 - This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

AB - This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

KW - Formal Verification

KW - Markov Decision Process

KW - Model Checking

KW - Safe Exploration

KW - Safe Reinforcement Learning

UR - http://www.scopus.com/inward/record.url?scp=85091574202&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.CONCUR.2020.3

DO - 10.4230/LIPIcs.CONCUR.2020.3

M3 - Conference paper

SP - 31

EP - 316

BT - 31st International Conference on Concurrency Theory, CONCUR 2020

A2 - Konnov, Igor

A2 - Kovacs, Laura

PB - Schloss Dagstuhl - Leibniz-Zentrum für Informatik

T2 - 31st International Conference on Concurrency Theory

Y2 - 1 September 2020 through 4 September 2020

ER -

Safe Reinforcement Learning Using Probabilistic Shields

Abstract

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this