Online Shielding for Stochastic Systems

Bettina Könighofer*, Roderick Bloem, Martin Tappler, Julian Rudolf, Alexander Palmisano

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

We propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions of the agent that are unsafe with respect to a temporal logic specification. Our main contribution is a new synthesis algorithm for computing the shield online. Existing offline shielding approaches compute exhaustively the safety of all states-action combinations ahead-of-time, resulting in huge computation times, large memory consumption, and significant delays at runtime due to the look-ups in huge databases. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our proposed method is general and can be applied to a wide range of planning problems with stochastic behaviour. For our evaluation, we selected a 2-player version of the classical computer game Snake. The game requires fast decisions and the multiplayer setting induces a large state space, computationally expensive to analyze exhaustively. The safety objective of collision avoidance is easily transferable to a variety of planning tasks.
Original languageEnglish
Title of host publicationNASA Formal Methods - 13th International Symposium, NFM 2021, Proceedings
EditorsAaron Dutle, César A. Muñoz, Mariano M. Moscato, Laura Titolo, Ivan Perez
Place of PublicationCham
PublisherSpringer
Pages231-248
Number of pages18
ISBN (Electronic)978-3-030-76384-8
ISBN (Print)978-3-030-76383-1
DOIs
Publication statusPublished - 2021
Event13th NASA Formal Methods Symposium - Houston, Virtuell, United States
Duration: 24 May 202128 May 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12673 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th NASA Formal Methods Symposium
Abbreviated titleNFM 20
Country/TerritoryUnited States
CityVirtuell
Period24/05/2128/05/21

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Online Shielding for Stochastic Systems'. Together they form a unique fingerprint.

Cite this