Active learning approach to label network traffic datasets

Jorge L. Guerra Torres; Carlos A Catania; Eduardo Veas

doi:10.1016/j.jisa.2019.102388

Active learning approach to label network traffic datasets

Jorge L. Guerra Torres, Carlos A Catania, Eduardo Veas

Institute of Interactive Systems and Data Science (7060)

Publikation: Beitrag in einer Fachzeitschrift › Artikel › Begutachtung

Abstract

In the field of network security, the process of labeling a network traffic dataset is specially expensive since expert knowledge is required to perform the annotations. With the aid of visual analytic applications such as RiskID, the effort of labeling network traffic is considerable reduced. However, since the label assignment still requires an expert pondering several factors, the annotation process remains a difficult task. The present article introduces a novel active learning strategy for building a random forest model based on user previously-labeled connections. The resulting model provides to the user an estimation of the probability of the remaining unlabeled connections helping him in the traffic annotation task. The article describes the active learning strategy, the interfaces with the RiskID system, the algorithms used to predict botnet behavior, and a proposed evaluation framework. The evaluation framework includes studies to assess not only the prediction performance of the active learning strategy but also the learning rate and resilience against noise as well as the improvements on other well known labeling strategies. The framework represents a complete methodology for evaluating the performance of any active learning solution. The evaluation results showed proposed approach is a significant improvement over previous labeling strategies.

Originalsprache	englisch
Aufsatznummer	102388
Seitenumfang	13
Fachzeitschrift	Journal of Information Security and Applications
Jahrgang	49
DOIs	https://doi.org/10.1016/j.jisa.2019.102388
Publikationsstatus	Veröffentlicht - 2019

Zugriff auf Dokument

10.1016/j.jisa.2019.102388

Dieses zitieren

@article{b60abf6d51f44aacb4813b138737a393,

title = "Active learning approach to label network traffic datasets",

abstract = "In the field of network security, the process of labeling a network traffic dataset is specially expensive since expert knowledge is required to perform the annotations. With the aid of visual analytic applications such as RiskID, the effort of labeling network traffic is considerable reduced. However, since the label assignment still requires an expert pondering several factors, the annotation process remains a difficult task. The present article introduces a novel active learning strategy for building a random forest model based on user previously-labeled connections. The resulting model provides to the user an estimation of the probability of the remaining unlabeled connections helping him in the traffic annotation task. The article describes the active learning strategy, the interfaces with the RiskID system, the algorithms used to predict botnet behavior, and a proposed evaluation framework. The evaluation framework includes studies to assess not only the prediction performance of the active learning strategy but also the learning rate and resilience against noise as well as the improvements on other well known labeling strategies. The framework represents a complete methodology for evaluating the performance of any active learning solution. The evaluation results showed proposed approach is a significant improvement over previous labeling strategies.",

author = "{Guerra Torres}, {Jorge L.} and Catania, {Carlos A} and Eduardo Veas",

year = "2019",

doi = "10.1016/j.jisa.2019.102388",

language = "English",

volume = "49",

journal = "Journal of Information Security and Applications",

issn = "2214-2126",

publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Active learning approach to label network traffic datasets

AU - Guerra Torres, Jorge L.

AU - Catania, Carlos A

AU - Veas, Eduardo

PY - 2019

Y1 - 2019

N2 - In the field of network security, the process of labeling a network traffic dataset is specially expensive since expert knowledge is required to perform the annotations. With the aid of visual analytic applications such as RiskID, the effort of labeling network traffic is considerable reduced. However, since the label assignment still requires an expert pondering several factors, the annotation process remains a difficult task. The present article introduces a novel active learning strategy for building a random forest model based on user previously-labeled connections. The resulting model provides to the user an estimation of the probability of the remaining unlabeled connections helping him in the traffic annotation task. The article describes the active learning strategy, the interfaces with the RiskID system, the algorithms used to predict botnet behavior, and a proposed evaluation framework. The evaluation framework includes studies to assess not only the prediction performance of the active learning strategy but also the learning rate and resilience against noise as well as the improvements on other well known labeling strategies. The framework represents a complete methodology for evaluating the performance of any active learning solution. The evaluation results showed proposed approach is a significant improvement over previous labeling strategies.

AB - In the field of network security, the process of labeling a network traffic dataset is specially expensive since expert knowledge is required to perform the annotations. With the aid of visual analytic applications such as RiskID, the effort of labeling network traffic is considerable reduced. However, since the label assignment still requires an expert pondering several factors, the annotation process remains a difficult task. The present article introduces a novel active learning strategy for building a random forest model based on user previously-labeled connections. The resulting model provides to the user an estimation of the probability of the remaining unlabeled connections helping him in the traffic annotation task. The article describes the active learning strategy, the interfaces with the RiskID system, the algorithms used to predict botnet behavior, and a proposed evaluation framework. The evaluation framework includes studies to assess not only the prediction performance of the active learning strategy but also the learning rate and resilience against noise as well as the improvements on other well known labeling strategies. The framework represents a complete methodology for evaluating the performance of any active learning solution. The evaluation results showed proposed approach is a significant improvement over previous labeling strategies.

U2 - 10.1016/j.jisa.2019.102388

DO - 10.1016/j.jisa.2019.102388

M3 - Article

SN - 2214-2126

VL - 49

JO - Journal of Information Security and Applications

JF - Journal of Information Security and Applications

M1 - 102388

ER -

Active learning approach to label network traffic datasets

Abstract

Zugriff auf Dokument

Fingerprint

Dieses zitieren