Shield Synthesis for Reinforcement Learning

Bettina Könighofer; Roderick Bloem; Nils Jansen; Florian Lukas Lorber

doi:10.1007/978-3-030-61362-4_16

Shield Synthesis for Reinforcement Learning

Bettina Könighofer, Roderick Bloem, Nils Jansen, Florian Lukas Lorber

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

Reinforcement learning algorithms discover policies that
maximize reward. However, these policies generally do not adhere to
safety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal
approach to synthesize a correct-by-construction reactive system called a
shield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learning
agent guarantees safety during learning and execution phases. In this
paper we summarize three types of shields that are synthesized from
different specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforce
specifications expressed as linear temporal logic specifications. Second,
we discuss the synthesis of probabilistic shields from specifications in
probabilistic temporal logic. Third, we discuss how to synthesize timed
shields from timed automata specifications. This paper summarizes the
application areas, advantages, disadvantages and synthesis approaches
for the three types of shields and gives an overview of experimental
results.

Originalsprache	englisch
Titel	Leveraging Applications of Formal Methods, Verification and Validation
Untertitel	Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings
Redakteure/-innen	Tiziana Margaria, Bernhard Steffen
Seiten	290-306
Seitenumfang	17
Band	1
DOIs	https://doi.org/10.1007/978-3-030-61362-4_16
Publikationsstatus	Elektronische Veröffentlichung vor Drucklegung. - 29 Okt. 2020
Veranstaltung	2020 International Symposium on Leveraging Applications of Formal Methods - Virtuell, Griechenland Dauer: 26 Okt. 2020 → 30 Okt. 2020

Publikationsreihe

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band	12476 LNCS
ISSN (Print)	0302-9743
ISSN (elektronisch)	1611-3349

Konferenz

Konferenz	2020 International Symposium on Leveraging Applications of Formal Methods
Kurztitel	ISoLA 2020
Land/Gebiet	Griechenland
Ort	Virtuell
Zeitraum	26/10/20 → 30/10/20

ASJC Scopus subject areas

Theoretische Informatik
Allgemeine Computerwissenschaft

Zugriff auf Dokument

10.1007/978-3-030-61362-4_16

Andere Dateien und Links

http://www.scopus.com/inward/record.url?scp=85097420844&partnerID=8YFLogxK

Dieses zitieren

Könighofer, B., Bloem, R., Jansen, N., & Lorber, F. L. (2020). Shield Synthesis for Reinforcement Learning. in T. Margaria, & B. Steffen (Hrsg.), Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings (Band 1, S. 290-306). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12476 LNCS). Vorzeitige Online-Publikation. https://doi.org/10.1007/978-3-030-61362-4_16

Shield Synthesis for Reinforcement Learning. / Könighofer, Bettina ; Bloem, Roderick; Jansen, Nils et al.
Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. Hrsg. / Tiziana Margaria; Bernhard Steffen. Band 1 2020. S. 290-306 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12476 LNCS).

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Könighofer, B , Bloem, R, Jansen, N & Lorber, FL 2020, Shield Synthesis for Reinforcement Learning. in T Margaria & B Steffen (Hrsg.), Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. Bd. 1, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 12476 LNCS, S. 290-306, 2020 International Symposium on Leveraging Applications of Formal Methods, Virtuell, Griechenland, 26/10/20. https://doi.org/10.1007/978-3-030-61362-4_16

Könighofer B , Bloem R, Jansen N, Lorber FL. Shield Synthesis for Reinforcement Learning. in Margaria T, Steffen B, Hrsg., Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. Band 1. 2020. S. 290-306. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2020 Okt 29. doi: 10.1007/978-3-030-61362-4_16

Könighofer, Bettina ; Bloem, Roderick ; Jansen, Nils et al. / Shield Synthesis for Reinforcement Learning. Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. Hrsg. / Tiziana Margaria ; Bernhard Steffen. Band 1 2020. S. 290-306 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{c3afa211e848440997b8d2d7751d7134,

title = "Shield Synthesis for Reinforcement Learning",

abstract = "Reinforcement learning algorithms discover policies thatmaximize reward. However, these policies generally do not adhere tosafety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called ashield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learningagent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized fromdifferent specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforcespecifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications inprobabilistic temporal logic. Third, we discuss how to synthesize timedshields from timed automata specifications. This paper summarizes theapplication areas, advantages, disadvantages and synthesis approachesfor the three types of shields and gives an overview of experimentalresults.",

author = "Bettina K{\"o}nighofer and Roderick Bloem and Nils Jansen and Lorber, {Florian Lukas}",

year = "2020",

month = oct,

day = "29",

doi = "10.1007/978-3-030-61362-4_16",

language = "English",

isbn = "978-3-030-61361-7",

volume = "1",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "290--306",

editor = "Tiziana Margaria and Bernhard Steffen",

booktitle = "Leveraging Applications of Formal Methods, Verification and Validation",

note = "2020 International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020 ; Conference date: 26-10-2020 Through 30-10-2020",

}

TY - GEN

T1 - Shield Synthesis for Reinforcement Learning

AU - Könighofer, Bettina

AU - Bloem, Roderick

AU - Jansen, Nils

AU - Lorber, Florian Lukas

PY - 2020/10/29

Y1 - 2020/10/29

N2 - Reinforcement learning algorithms discover policies thatmaximize reward. However, these policies generally do not adhere tosafety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called ashield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learningagent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized fromdifferent specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforcespecifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications inprobabilistic temporal logic. Third, we discuss how to synthesize timedshields from timed automata specifications. This paper summarizes theapplication areas, advantages, disadvantages and synthesis approachesfor the three types of shields and gives an overview of experimentalresults.

AB - Reinforcement learning algorithms discover policies thatmaximize reward. However, these policies generally do not adhere tosafety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called ashield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learningagent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized fromdifferent specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforcespecifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications inprobabilistic temporal logic. Third, we discuss how to synthesize timedshields from timed automata specifications. This paper summarizes theapplication areas, advantages, disadvantages and synthesis approachesfor the three types of shields and gives an overview of experimentalresults.

UR - http://www.scopus.com/inward/record.url?scp=85097420844&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-61362-4_16

DO - 10.1007/978-3-030-61362-4_16

M3 - Conference paper

SN - 978-3-030-61361-7

VL - 1

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 290

EP - 306

BT - Leveraging Applications of Formal Methods, Verification and Validation

A2 - Margaria, Tiziana

A2 - Steffen, Bernhard

T2 - 2020 International Symposium on Leveraging Applications of Formal Methods

Y2 - 26 October 2020 through 30 October 2020

ER -

Shield Synthesis for Reinforcement Learning

Abstract

Publikationsreihe

Konferenz

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Dieses zitieren