Shield Synthesis for Reinforcement Learning

Bettina Könighofer; Roderick Bloem; Nils Jansen; Florian Lukas Lorber

doi:10.1007/978-3-030-61362-4_16

Shield Synthesis for Reinforcement Learning

Bettina Könighofer, Roderick Bloem, Nils Jansen, Florian Lukas Lorber

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

Reinforcement learning algorithms discover policies that
maximize reward. However, these policies generally do not adhere to
safety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal
approach to synthesize a correct-by-construction reactive system called a
shield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learning
agent guarantees safety during learning and execution phases. In this
paper we summarize three types of shields that are synthesized from
different specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforce
specifications expressed as linear temporal logic specifications. Second,
we discuss the synthesis of probabilistic shields from specifications in
probabilistic temporal logic. Third, we discuss how to synthesize timed
shields from timed automata specifications. This paper summarizes the
application areas, advantages, disadvantages and synthesis approaches
for the three types of shields and gives an overview of experimental
results.

Original language	English
Title of host publication	Leveraging Applications of Formal Methods, Verification and Validation
Subtitle of host publication	Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings
Editors	Tiziana Margaria, Bernhard Steffen
Pages	290-306
Number of pages	17
Volume	1
DOIs	https://doi.org/10.1007/978-3-030-61362-4_16
Publication status	E-pub ahead of print - 29 Oct 2020
Event	2020 International Symposium on Leveraging Applications of Formal Methods - Virtuell, Greece Duration: 26 Oct 2020 → 30 Oct 2020

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	12476 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	2020 International Symposium on Leveraging Applications of Formal Methods
Abbreviated title	ISoLA 2020
Country/Territory	Greece
City	Virtuell
Period	26/10/20 → 30/10/20

ASJC Scopus subject areas

Theoretical Computer Science
Computer Science(all)

Access to Document

10.1007/978-3-030-61362-4_16

Cite this

Könighofer, B., Bloem, R., Jansen, N., & Lorber, F. L. (2020). Shield Synthesis for Reinforcement Learning. In T. Margaria, & B. Steffen (Eds.), Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings (Vol. 1, pp. 290-306). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12476 LNCS). Advance online publication. https://doi.org/10.1007/978-3-030-61362-4_16

Shield Synthesis for Reinforcement Learning. / Könighofer, Bettina ; Bloem, Roderick; Jansen, Nils et al.
Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. ed. / Tiziana Margaria; Bernhard Steffen. Vol. 1 2020. p. 290-306 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12476 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Könighofer, B , Bloem, R, Jansen, N & Lorber, FL 2020, Shield Synthesis for Reinforcement Learning. in T Margaria & B Steffen (eds), Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. vol. 1, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12476 LNCS, pp. 290-306, 2020 International Symposium on Leveraging Applications of Formal Methods, Virtuell, Greece, 26/10/20. https://doi.org/10.1007/978-3-030-61362-4_16

Könighofer B , Bloem R, Jansen N, Lorber FL. Shield Synthesis for Reinforcement Learning. In Margaria T, Steffen B, editors, Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. Vol. 1. 2020. p. 290-306. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2020 Oct 29. doi: 10.1007/978-3-030-61362-4_16

Könighofer, Bettina ; Bloem, Roderick ; Jansen, Nils et al. / Shield Synthesis for Reinforcement Learning. Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. editor / Tiziana Margaria ; Bernhard Steffen. Vol. 1 2020. pp. 290-306 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{c3afa211e848440997b8d2d7751d7134,

title = "Shield Synthesis for Reinforcement Learning",

abstract = "Reinforcement learning algorithms discover policies thatmaximize reward. However, these policies generally do not adhere tosafety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called ashield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learningagent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized fromdifferent specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforcespecifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications inprobabilistic temporal logic. Third, we discuss how to synthesize timedshields from timed automata specifications. This paper summarizes theapplication areas, advantages, disadvantages and synthesis approachesfor the three types of shields and gives an overview of experimentalresults.",

author = "Bettina K{\"o}nighofer and Roderick Bloem and Nils Jansen and Lorber, {Florian Lukas}",

year = "2020",

month = oct,

day = "29",

doi = "10.1007/978-3-030-61362-4_16",

language = "English",

isbn = "978-3-030-61361-7",

volume = "1",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "290--306",

editor = "Tiziana Margaria and Bernhard Steffen",

booktitle = "Leveraging Applications of Formal Methods, Verification and Validation",

note = "2020 International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020 ; Conference date: 26-10-2020 Through 30-10-2020",

}

TY - GEN

T1 - Shield Synthesis for Reinforcement Learning

AU - Könighofer, Bettina

AU - Bloem, Roderick

AU - Jansen, Nils

AU - Lorber, Florian Lukas

PY - 2020/10/29

Y1 - 2020/10/29

N2 - Reinforcement learning algorithms discover policies thatmaximize reward. However, these policies generally do not adhere tosafety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called ashield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learningagent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized fromdifferent specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforcespecifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications inprobabilistic temporal logic. Third, we discuss how to synthesize timedshields from timed automata specifications. This paper summarizes theapplication areas, advantages, disadvantages and synthesis approachesfor the three types of shields and gives an overview of experimentalresults.

AB - Reinforcement learning algorithms discover policies thatmaximize reward. However, these policies generally do not adhere tosafety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called ashield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learningagent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized fromdifferent specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforcespecifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications inprobabilistic temporal logic. Third, we discuss how to synthesize timedshields from timed automata specifications. This paper summarizes theapplication areas, advantages, disadvantages and synthesis approachesfor the three types of shields and gives an overview of experimentalresults.

UR - http://www.scopus.com/inward/record.url?scp=85097420844&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-61362-4_16

DO - 10.1007/978-3-030-61362-4_16

M3 - Conference paper

SN - 978-3-030-61361-7

VL - 1

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 290

EP - 306

BT - Leveraging Applications of Formal Methods, Verification and Validation

A2 - Margaria, Tiziana

A2 - Steffen, Bernhard

T2 - 2020 International Symposium on Leveraging Applications of Formal Methods

Y2 - 26 October 2020 through 30 October 2020

ER -

Shield Synthesis for Reinforcement Learning

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this