Abstract
Reinforcement learning algorithms discover policies that
maximize reward. However, these policies generally do not adhere to
safety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal
approach to synthesize a correct-by-construction reactive system called a
shield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learning
agent guarantees safety during learning and execution phases. In this
paper we summarize three types of shields that are synthesized from
different specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforce
specifications expressed as linear temporal logic specifications. Second,
we discuss the synthesis of probabilistic shields from specifications in
probabilistic temporal logic. Third, we discuss how to synthesize timed
shields from timed automata specifications. This paper summarizes the
application areas, advantages, disadvantages and synthesis approaches
for the three types of shields and gives an overview of experimental
results.
maximize reward. However, these policies generally do not adhere to
safety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal
approach to synthesize a correct-by-construction reactive system called a
shield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learning
agent guarantees safety during learning and execution phases. In this
paper we summarize three types of shields that are synthesized from
different specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforce
specifications expressed as linear temporal logic specifications. Second,
we discuss the synthesis of probabilistic shields from specifications in
probabilistic temporal logic. Third, we discuss how to synthesize timed
shields from timed automata specifications. This paper summarizes the
application areas, advantages, disadvantages and synthesis approaches
for the three types of shields and gives an overview of experimental
results.
Originalsprache | englisch |
---|---|
Titel | Leveraging Applications of Formal Methods, Verification and Validation |
Untertitel | Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings |
Redakteure/-innen | Tiziana Margaria, Bernhard Steffen |
Seiten | 290-306 |
Seitenumfang | 17 |
Band | 1 |
DOIs | |
Publikationsstatus | Elektronische Veröffentlichung vor Drucklegung. - 29 Okt. 2020 |
Veranstaltung | 2020 International Symposium on Leveraging Applications of Formal Methods - Virtuell, Griechenland Dauer: 26 Okt. 2020 → 30 Okt. 2020 |
Publikationsreihe
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Band | 12476 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (elektronisch) | 1611-3349 |
Konferenz
Konferenz | 2020 International Symposium on Leveraging Applications of Formal Methods |
---|---|
Kurztitel | ISoLA 2020 |
Land/Gebiet | Griechenland |
Ort | Virtuell |
Zeitraum | 26/10/20 → 30/10/20 |
ASJC Scopus subject areas
- Theoretische Informatik
- Allgemeine Computerwissenschaft