Plagiarism detection in SQL student assignments

Nikolai Scerbakov, Alexander Schukin, Oleg Sabinin

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

In this paper we present an original method of detecting similarity between SQL fragments. The method is based on identifying so-called "SQL lexemes" - persistent elements of an SQL statement, and "SQL variables" - easily modifiable elements of SQL statements. Thus, any SQL statements can be replaced with a so-called token - sequence of SQL lexemes and SQL variables. Distance between SQL tokens can be calculated using such a well-known algorithm as Levenshtein Metric. Small values of Levenshtein distance between tokens detect such SQL statements that were built by modifications of others.
We also present first practical results of actual application of the algorithm, and discuss further developments of the method.
Originalspracheenglisch
TitelProceedings of 20th International Conference on Interactive Collaborative Learning
Seiten321-326
Seitenumfang6
PublikationsstatusVeröffentlicht - 2017

ASJC Scopus subject areas

  • Informatik (insg.)

Dieses zitieren