Automata Learning meets Shielding

Martin Tappler; Stefan Pranger; Bettina Könighofer; Edi Muskardin; Roderick Bloem; Kim Guldstrand Larsen

doi:10.1007/978-3-031-19849-6_20

Automata Learning meets Shielding

Martin Tappler, Stefan Pranger, Bettina Könighofer, Edi Muskardin, Roderick Bloem, Kim Guldstrand Larsen

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

Safety is still one of the major research challenges in reinforcement learning (RL). In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach. Initially, the MDP representing the environment is unknown. The agent starts exploring the environment and collects traces. From the collected traces, we passively learn MDPs that abstractly represent the safety-relevant aspects of the environment. Given a learned MDP and a safety specification, we construct a shield. For each state-action pair within a learned MDP, the shield computes exact probabilities on how likely it is that executing the action results in violating the specification from the current state within the next k steps. After the shield is constructed, the shield is used during runtime and blocks any actions that induce a too large risk from the agent. The shielded agent continues to explore the environment and collects new data on the environment. Iteratively, we use the collected data to learn new MDPs with higher accuracy, resulting in turn in shields able to prevent more safety violations. We implemented our approach and present a detailed case study of a Q-learning agent exploring slippery Gridworlds. In our experiments, we show that as the agent explores more and more of the environment during training, the improved learned models lead to shields that are able to prevent many safety violations.

Originalsprache	englisch
Titel	Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles - 11th International Symposium, ISoLA 2022, Proceedings
Untertitel	ISoLA 2022
Redakteure/-innen	Tiziana Margaria, Bernhard Steffen
Erscheinungsort	Cham
Herausgeber (Verlag)	Springer
Seiten	335-359
Seitenumfang	25
ISBN (elektronisch)	978-3-031-19849-6
ISBN (Print)	978-3-031-19848-9
DOIs	https://doi.org/10.1007/978-3-031-19849-6_20
Publikationsstatus	Veröffentlicht - 2022
Veranstaltung	ISOLA 2022: 11th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation - Rhodos, Griechenland Dauer: 22 Okt. 2022 → 30 Okt. 2022 https://2022.isola-conference.org/

Publikationsreihe

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band	13701 LNCS
ISSN (Print)	0302-9743
ISSN (elektronisch)	1611-3349

Konferenz

Konferenz	ISOLA 2022
Kurztitel	ISOLA 2022
Land/Gebiet	Griechenland
Ort	Rhodos
Zeitraum	22/10/22 → 30/10/22
Internetadresse	https://2022.isola-conference.org/

Schlagwörter

Automata Learning
Markov Decision Processes
Shielding

ASJC Scopus subject areas

Theoretische Informatik
Informatik (insg.)

Zugriff auf Dokument

10.1007/978-3-031-19849-6_20

Isola22_Automata_Learning_meets_Shielding_submissionEingereichtes Manuskript, 950 KB

Andere Dateien und Links

Verknüpfung zur Publikation in Scopus

EU - FOCETA - Grundlagen für kontinierliches Engineering von vertraunswertiger Autonomie
Bloem, R.
1/10/20 → 30/09/23
Projekt: Forschungsprojekt

Dieses zitieren

Tappler, M., Pranger, S., Könighofer, B., Muskardin, E., Bloem, R., & Larsen, K. G. (2022). Automata Learning meets Shielding. in T. Margaria, & B. Steffen (Hrsg.), Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles - 11th International Symposium, ISoLA 2022, Proceedings: ISoLA 2022 (S. 335-359). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13701 LNCS). Springer. https://doi.org/10.1007/978-3-031-19849-6_20

Automata Learning meets Shielding. / Tappler, Martin ; Pranger, Stefan ; Könighofer, Bettina et al.
Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles - 11th International Symposium, ISoLA 2022, Proceedings: ISoLA 2022. Hrsg. / Tiziana Margaria; Bernhard Steffen. Cham: Springer, 2022. S. 335-359 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13701 LNCS).

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Tappler, M , Pranger, S , Könighofer, B, Muskardin, E, Bloem, R & Larsen, KG 2022, Automata Learning meets Shielding. in T Margaria & B Steffen (Hrsg.), Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles - 11th International Symposium, ISoLA 2022, Proceedings: ISoLA 2022. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 13701 LNCS, Springer, Cham, S. 335-359, ISOLA 2022, Rhodos, Griechenland, 22/10/22. https://doi.org/10.1007/978-3-031-19849-6_20

Tappler M , Pranger S , Könighofer B, Muskardin E, Bloem R, Larsen KG. Automata Learning meets Shielding. in Margaria T, Steffen B, Hrsg., Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles - 11th International Symposium, ISoLA 2022, Proceedings: ISoLA 2022. Cham: Springer. 2022. S. 335-359. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-19849-6_20

Tappler, Martin ; Pranger, Stefan ; Könighofer, Bettina et al. / Automata Learning meets Shielding. Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles - 11th International Symposium, ISoLA 2022, Proceedings: ISoLA 2022. Hrsg. / Tiziana Margaria ; Bernhard Steffen. Cham : Springer, 2022. S. 335-359 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{d04bfaa6500a41c0839ad3392de8b0b3,

title = "Automata Learning meets Shielding",

abstract = "Safety is still one of the major research challenges in reinforcement learning (RL). In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach. Initially, the MDP representing the environment is unknown. The agent starts exploring the environment and collects traces. From the collected traces, we passively learn MDPs that abstractly represent the safety-relevant aspects of the environment. Given a learned MDP and a safety specification, we construct a shield. For each state-action pair within a learned MDP, the shield computes exact probabilities on how likely it is that executing the action results in violating the specification from the current state within the next k steps. After the shield is constructed, the shield is used during runtime and blocks any actions that induce a too large risk from the agent. The shielded agent continues to explore the environment and collects new data on the environment. Iteratively, we use the collected data to learn new MDPs with higher accuracy, resulting in turn in shields able to prevent more safety violations. We implemented our approach and present a detailed case study of a Q-learning agent exploring slippery Gridworlds. In our experiments, we show that as the agent explores more and more of the environment during training, the improved learned models lead to shields that are able to prevent many safety violations.",

keywords = "Automata Learning, Markov Decision Processes, Shielding, Automata learning, Markov Decision Processes, Shielding",

author = "Martin Tappler and Stefan Pranger and Bettina K{\"o}nighofer and Edi Muskardin and Roderick Bloem and Larsen, {Kim Guldstrand}",

year = "2022",

doi = "10.1007/978-3-031-19849-6_20",

language = "English",

isbn = "978-3-031-19848-9",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "335--359",

editor = "Tiziana Margaria and Bernhard Steffen",

booktitle = "Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles - 11th International Symposium, ISoLA 2022, Proceedings",

note = "ISOLA 2022 : 11th International Symposium On Leveraging Applications of Formal Methods, Verification and Validation, ISOLA 2022 ; Conference date: 22-10-2022 Through 30-10-2022",

url = "https://2022.isola-conference.org/",

}

TY - GEN

T1 - Automata Learning meets Shielding

AU - Tappler, Martin

AU - Pranger, Stefan

AU - Könighofer, Bettina

AU - Muskardin, Edi

AU - Bloem, Roderick

AU - Larsen, Kim Guldstrand

PY - 2022

Y1 - 2022

N2 - Safety is still one of the major research challenges in reinforcement learning (RL). In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach. Initially, the MDP representing the environment is unknown. The agent starts exploring the environment and collects traces. From the collected traces, we passively learn MDPs that abstractly represent the safety-relevant aspects of the environment. Given a learned MDP and a safety specification, we construct a shield. For each state-action pair within a learned MDP, the shield computes exact probabilities on how likely it is that executing the action results in violating the specification from the current state within the next k steps. After the shield is constructed, the shield is used during runtime and blocks any actions that induce a too large risk from the agent. The shielded agent continues to explore the environment and collects new data on the environment. Iteratively, we use the collected data to learn new MDPs with higher accuracy, resulting in turn in shields able to prevent more safety violations. We implemented our approach and present a detailed case study of a Q-learning agent exploring slippery Gridworlds. In our experiments, we show that as the agent explores more and more of the environment during training, the improved learned models lead to shields that are able to prevent many safety violations.

AB - Safety is still one of the major research challenges in reinforcement learning (RL). In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach. Initially, the MDP representing the environment is unknown. The agent starts exploring the environment and collects traces. From the collected traces, we passively learn MDPs that abstractly represent the safety-relevant aspects of the environment. Given a learned MDP and a safety specification, we construct a shield. For each state-action pair within a learned MDP, the shield computes exact probabilities on how likely it is that executing the action results in violating the specification from the current state within the next k steps. After the shield is constructed, the shield is used during runtime and blocks any actions that induce a too large risk from the agent. The shielded agent continues to explore the environment and collects new data on the environment. Iteratively, we use the collected data to learn new MDPs with higher accuracy, resulting in turn in shields able to prevent more safety violations. We implemented our approach and present a detailed case study of a Q-learning agent exploring slippery Gridworlds. In our experiments, we show that as the agent explores more and more of the environment during training, the improved learned models lead to shields that are able to prevent many safety violations.

KW - Automata Learning

KW - Markov Decision Processes

KW - Shielding

KW - Automata learning

KW - Markov Decision Processes

KW - Shielding

UR - http://www.scopus.com/inward/record.url?scp=85142770189&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-19849-6_20

DO - 10.1007/978-3-031-19849-6_20

M3 - Conference paper

SN - 978-3-031-19848-9

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 335

EP - 359

BT - Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles - 11th International Symposium, ISoLA 2022, Proceedings

A2 - Margaria, Tiziana

A2 - Steffen, Bernhard

PB - Springer

CY - Cham

T2 - ISOLA 2022

Y2 - 22 October 2022 through 30 October 2022

ER -

Automata Learning meets Shielding

Abstract

Publikationsreihe

Konferenz

Schlagwörter

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Projekte

EU - FOCETA - Grundlagen für kontinierliches Engineering von vertraunswertiger Autonomie

Dieses zitieren