Online Shielding for Stochastic Systems

Bettina Könighofer*, Roderick Bloem, Martin Tappler, Julian Rudolf, Alexander Palmisano

*Korrespondierende/r Autor/-in für diese Arbeit

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

We propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions of the agent that are unsafe with respect to a temporal logic specification. Our main contribution is a new synthesis algorithm for computing the shield online. Existing offline shielding approaches compute exhaustively the safety of all states-action combinations ahead-of-time, resulting in huge computation times, large memory consumption, and significant delays at runtime due to the look-ups in huge databases. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our proposed method is general and can be applied to a wide range of planning problems with stochastic behaviour. For our evaluation, we selected a 2-player version of the classical computer game Snake. The game requires fast decisions and the multiplayer setting induces a large state space, computationally expensive to analyze exhaustively. The safety objective of collision avoidance is easily transferable to a variety of planning tasks.
Originalspracheenglisch
TitelNASA Formal Methods - 13th International Symposium, NFM 2021, Proceedings
Redakteure/-innenAaron Dutle, César A. Muñoz, Mariano M. Moscato, Laura Titolo, Ivan Perez
ErscheinungsortCham
Herausgeber (Verlag)Springer
Seiten231-248
Seitenumfang18
ISBN (elektronisch)978-3-030-76384-8
ISBN (Print)978-3-030-76383-1
DOIs
PublikationsstatusVeröffentlicht - 2021
Veranstaltung13th NASA Formal Methods Symposium - Houston, Virtuell, USA / Vereinigte Staaten
Dauer: 24 Mai 202128 Mai 2021

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band12673 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Konferenz

Konferenz13th NASA Formal Methods Symposium
KurztitelNFM 20
Land/GebietUSA / Vereinigte Staaten
OrtVirtuell
Zeitraum24/05/2128/05/21

ASJC Scopus subject areas

  • Theoretische Informatik
  • Informatik (insg.)

Fingerprint

Untersuchen Sie die Forschungsthemen von „Online Shielding for Stochastic Systems“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren