Using textual bug reports to predict the fault category of software bugs

Thomas Hirsch; Birgit Gertraud Hofer

doi:10.1016/j.array.2022.100189

Using textual bug reports to predict the fault category of software bugs

Thomas Hirsch, Birgit Gertraud Hofer^*

^*Korrespondierende/r Autor/-in für diese Arbeit

Institut für Softwaretechnologie (7160)

Publikation: Beitrag in einer Fachzeitschrift › Artikel › Begutachtung

Abstract

Debugging is a time-consuming and expensive process. Developers have to select appropriate tools, methods and approaches in order to efficiently reproduce, localize and fix bugs. These choices are based on the developers’ assessment of the type of fault for a given bug report. This paper proposes a machine learning (ML) based approach that predicts the fault type for a given textual bug report. We built a dataset from 70+ projects for training and evaluation of our approach. Further, we performed a user study to establish a baseline for non-expert human performance on this task. Our models, incorporating our custom preprocessing approaches, reach up to 0.69% macro average F1 score on this bug classification problem. We demonstrate inter-project transferability of our approach. Further, we identify and discuss issues and limitations of ML classification approaches applied on textual bug reports. Our models can support researchers in data collection efforts, as for example bug benchmark creation. In future, such models could aid inexperienced developers in debugging tool selection, helping save time and resources.

Originalsprache	englisch
Aufsatznummer	100189
Seitenumfang	12
Fachzeitschrift	Array
Jahrgang	15
DOIs	https://doi.org/10.1016/j.array.2022.100189
Publikationsstatus	Veröffentlicht - Sept. 2022

ASJC Scopus subject areas

Software

Fields of Expertise

Information, Communication & Computing

Treatment code (Nähere Zuordnung)

Basic - Fundamental (Grundlagenforschung)

Zugriff auf Dokument

10.1016/j.array.2022.100189Lizenz: CC BY 4.0

FWF - AMADEUS - Die Nutzbarmachung von automatischen Debugging-Ansätzen
Hofer, B. G.
1/01/20 → 30/04/24
Projekt: Forschungsprojekt

Dieses zitieren

@article{a13566ebca264d5ab29ed62a2e91fc85,

title = "Using textual bug reports to predict the fault category of software bugs",

abstract = "Debugging is a time-consuming and expensive process. Developers have to select appropriate tools, methods and approaches in order to efficiently reproduce, localize and fix bugs. These choices are based on the developers{\textquoteright} assessment of the type of fault for a given bug report. This paper proposes a machine learning (ML) based approach that predicts the fault type for a given textual bug report. We built a dataset from 70+ projects for training and evaluation of our approach. Further, we performed a user study to establish a baseline for non-expert human performance on this task. Our models, incorporating our custom preprocessing approaches, reach up to 0.69% macro average F1 score on this bug classification problem. We demonstrate inter-project transferability of our approach. Further, we identify and discuss issues and limitations of ML classification approaches applied on textual bug reports. Our models can support researchers in data collection efforts, as for example bug benchmark creation. In future, such models could aid inexperienced developers in debugging tool selection, helping save time and resources.",

keywords = "Bug report, Bug benchmark, Fault type prediction",

author = "Thomas Hirsch and Hofer, {Birgit Gertraud}",

year = "2022",

month = sep,

doi = "10.1016/j.array.2022.100189",

language = "English",

volume = "15",

journal = "Array",

issn = "2590-0056",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Using textual bug reports to predict the fault category of software bugs

AU - Hirsch, Thomas

AU - Hofer, Birgit Gertraud

PY - 2022/9

Y1 - 2022/9

N2 - Debugging is a time-consuming and expensive process. Developers have to select appropriate tools, methods and approaches in order to efficiently reproduce, localize and fix bugs. These choices are based on the developers’ assessment of the type of fault for a given bug report. This paper proposes a machine learning (ML) based approach that predicts the fault type for a given textual bug report. We built a dataset from 70+ projects for training and evaluation of our approach. Further, we performed a user study to establish a baseline for non-expert human performance on this task. Our models, incorporating our custom preprocessing approaches, reach up to 0.69% macro average F1 score on this bug classification problem. We demonstrate inter-project transferability of our approach. Further, we identify and discuss issues and limitations of ML classification approaches applied on textual bug reports. Our models can support researchers in data collection efforts, as for example bug benchmark creation. In future, such models could aid inexperienced developers in debugging tool selection, helping save time and resources.

AB - Debugging is a time-consuming and expensive process. Developers have to select appropriate tools, methods and approaches in order to efficiently reproduce, localize and fix bugs. These choices are based on the developers’ assessment of the type of fault for a given bug report. This paper proposes a machine learning (ML) based approach that predicts the fault type for a given textual bug report. We built a dataset from 70+ projects for training and evaluation of our approach. Further, we performed a user study to establish a baseline for non-expert human performance on this task. Our models, incorporating our custom preprocessing approaches, reach up to 0.69% macro average F1 score on this bug classification problem. We demonstrate inter-project transferability of our approach. Further, we identify and discuss issues and limitations of ML classification approaches applied on textual bug reports. Our models can support researchers in data collection efforts, as for example bug benchmark creation. In future, such models could aid inexperienced developers in debugging tool selection, helping save time and resources.

KW - Bug report

KW - Bug benchmark

KW - Fault type prediction

U2 - 10.1016/j.array.2022.100189

DO - 10.1016/j.array.2022.100189

M3 - Article

SN - 2590-0056

VL - 15

JO - Array

JF - Array

M1 - 100189

ER -

Using textual bug reports to predict the fault category of software bugs

Abstract

ASJC Scopus subject areas

Fields of Expertise

Treatment code (Nähere Zuordnung)

Zugriff auf Dokument

Fingerprint

Projekte

FWF - AMADEUS - Die Nutzbarmachung von automatischen Debugging-Ansätzen

Dieses zitieren