L*-Based Learning of Markov Decision Processes (Extended Version)

Martin Tappler; Bernhard Aichernig; Giovanni Bacci; Maria Eichlseder; Kim Guldstrand Larsen

doi:10.1007/s00165-021-00536-5

L*-Based Learning of Markov Decision Processes (Extended Version)

Martin Tappler, Bernhard Aichernig^*, Giovanni Bacci, Maria Eichlseder, Kim Guldstrand Larsen

^*Korrespondierende/r Autor/-in für diese Arbeit

Publikation: Beitrag in einer Fachzeitschrift › Artikel › Begutachtung

Abstract

Automata learning techniques automatically generate systemmodels from test observations. Typically, these techniques fall into two categories: passive and active. On the one hand, passive learning assumes no interaction with the system under learning and uses a predetermined training set, e.g., system logs. On the other hand, active learning techniques collect training data by actively querying the system under learning, allowing one to steer the discovery ofmeaningful information about the systemunder learning leading to effective learning strategies. A notable example of active learning technique for regular languages is Angluin’s L∗-algorithm. The L∗-algorithm describes the strategy of a student who learns the minimal deterministic finite automaton of an unknown regular language L by asking a succinct number of queries to a teacher who knows L.
In this work, we study L∗-based learning of deterministic Markov decision processes, a class of Markov decision processes where an observation following an action uniquely determines a successor state. For this purpose, we first assume an ideal setting with a teacher who provides perfect information to the student. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling execution traces of the system via testing.
Experiments performed on an implementation of our sampling-based algorithm suggest that our method achieves better accuracy than state-of-the-art passive learning techniques using the same amount of test obser vations. In contrast to existing learning algorithms which assume a predefined number of states, our algorithm learns the complete model structure including the state space.

Originalsprache	englisch
Seiten (von - bis)	575-615
Seitenumfang	41
Fachzeitschrift	Formal Aspects of Computing
Jahrgang	33
Ausgabenummer	4-5
Frühes Online-Datum	31 März 2021
DOIs	https://doi.org/10.1007/s00165-021-00536-5
Publikationsstatus	Veröffentlicht - Aug. 2021

ASJC Scopus subject areas

Software
Theoretische Informatik

Fields of Expertise

Information, Communication & Computing

Zugriff auf Dokument

10.1007/s00165-021-00536-5Lizenz: CC BY 4.0

Andere Dateien und Links

http://www.scopus.com/inward/record.url?scp=85103407848&partnerID=8YFLogxK

1 Abgeschlossen

Verlaesslichkeit im Internet der Dinge
Boano, C. A., Kubin, G., Bloem, R., Horn, M., Pernkopf, F., Zakany, N., Mangard, S., Witrisal, K., Römer, K. U., Aichernig, B., Bösch, W., Baunach, M. C., Tappler, M., Malenko, M., Weiser, S., Eichlseder, M., Leitinger, E., Grosinger, J., Großwindhager, B., Ebrahimi, M., Alothman Alterkawi, A. B., Knoll, C., Teschl, R., Saukh, O., Rath, M., Steinberger, M., Steinbauer-Wagner, G. & Tranninger, M.
1/01/16 → 31/03/22
Projekt: Forschungsprojekt

Dieses zitieren

@article{bc29a34f8c4847b6931455aedb9d7cf1,

title = "L*-Based Learning of Markov Decision Processes (Extended Version)",

abstract = "Automata learning techniques automatically generate systemmodels from test observations. Typically, these techniques fall into two categories: passive and active. On the one hand, passive learning assumes no interaction with the system under learning and uses a predetermined training set, e.g., system logs. On the other hand, active learning techniques collect training data by actively querying the system under learning, allowing one to steer the discovery ofmeaningful information about the systemunder learning leading to effective learning strategies. A notable example of active learning technique for regular languages is Angluin{\textquoteright}s L∗-algorithm. The L∗-algorithm describes the strategy of a student who learns the minimal deterministic finite automaton of an unknown regular language L by asking a succinct number of queries to a teacher who knows L.In this work, we study L∗-based learning of deterministic Markov decision processes, a class of Markov decision processes where an observation following an action uniquely determines a successor state. For this purpose, we first assume an ideal setting with a teacher who provides perfect information to the student. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling execution traces of the system via testing.Experiments performed on an implementation of our sampling-based algorithm suggest that our method achieves better accuracy than state-of-the-art passive learning techniques using the same amount of test obser vations. In contrast to existing learning algorithms which assume a predefined number of states, our algorithm learns the complete model structure including the state space.",

keywords = "Active automata learning, Markov decision processes, Model inference",

author = "Martin Tappler and Bernhard Aichernig and Giovanni Bacci and Maria Eichlseder and Larsen, {Kim Guldstrand}",

year = "2021",

month = aug,

doi = "10.1007/s00165-021-00536-5",

language = "English",

volume = "33",

pages = "575--615",

journal = "Formal Aspects of Computing",

issn = "0934-5043",

publisher = "Springer",

number = "4-5",

}

TY - JOUR

T1 - L*-Based Learning of Markov Decision Processes (Extended Version)

AU - Tappler, Martin

AU - Aichernig, Bernhard

AU - Bacci, Giovanni

AU - Eichlseder, Maria

AU - Larsen, Kim Guldstrand

PY - 2021/8

Y1 - 2021/8

N2 - Automata learning techniques automatically generate systemmodels from test observations. Typically, these techniques fall into two categories: passive and active. On the one hand, passive learning assumes no interaction with the system under learning and uses a predetermined training set, e.g., system logs. On the other hand, active learning techniques collect training data by actively querying the system under learning, allowing one to steer the discovery ofmeaningful information about the systemunder learning leading to effective learning strategies. A notable example of active learning technique for regular languages is Angluin’s L∗-algorithm. The L∗-algorithm describes the strategy of a student who learns the minimal deterministic finite automaton of an unknown regular language L by asking a succinct number of queries to a teacher who knows L.In this work, we study L∗-based learning of deterministic Markov decision processes, a class of Markov decision processes where an observation following an action uniquely determines a successor state. For this purpose, we first assume an ideal setting with a teacher who provides perfect information to the student. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling execution traces of the system via testing.Experiments performed on an implementation of our sampling-based algorithm suggest that our method achieves better accuracy than state-of-the-art passive learning techniques using the same amount of test obser vations. In contrast to existing learning algorithms which assume a predefined number of states, our algorithm learns the complete model structure including the state space.

AB - Automata learning techniques automatically generate systemmodels from test observations. Typically, these techniques fall into two categories: passive and active. On the one hand, passive learning assumes no interaction with the system under learning and uses a predetermined training set, e.g., system logs. On the other hand, active learning techniques collect training data by actively querying the system under learning, allowing one to steer the discovery ofmeaningful information about the systemunder learning leading to effective learning strategies. A notable example of active learning technique for regular languages is Angluin’s L∗-algorithm. The L∗-algorithm describes the strategy of a student who learns the minimal deterministic finite automaton of an unknown regular language L by asking a succinct number of queries to a teacher who knows L.In this work, we study L∗-based learning of deterministic Markov decision processes, a class of Markov decision processes where an observation following an action uniquely determines a successor state. For this purpose, we first assume an ideal setting with a teacher who provides perfect information to the student. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling execution traces of the system via testing.Experiments performed on an implementation of our sampling-based algorithm suggest that our method achieves better accuracy than state-of-the-art passive learning techniques using the same amount of test obser vations. In contrast to existing learning algorithms which assume a predefined number of states, our algorithm learns the complete model structure including the state space.

KW - Active automata learning

KW - Markov decision processes

KW - Model inference

UR - http://www.scopus.com/inward/record.url?scp=85103407848&partnerID=8YFLogxK

U2 - 10.1007/s00165-021-00536-5

DO - 10.1007/s00165-021-00536-5

M3 - Article

SN - 0934-5043

VL - 33

SP - 575

EP - 615

JO - Formal Aspects of Computing

JF - Formal Aspects of Computing

IS - 4-5

ER -

L*-Based Learning of Markov Decision Processes (Extended Version)

Abstract

ASJC Scopus subject areas

Fields of Expertise

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Projekte

Verlaesslichkeit im Internet der Dinge

Dieses zitieren