Plagiarism detection in SQL student assignments

Nikolai Scerbakov, Alexander Schukin, Oleg Sabinin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we present an original method of detecting similarity between SQL fragments. The method is based on identifying so-called "SQL lexemes" - persistent elements of an SQL statement, and "SQL variables" - easily modifiable elements of SQL statements. Thus, any SQL statements can be replaced with a so-called token - sequence of SQL lexemes and SQL variables. Distance between SQL tokens can be calculated using such a well-known algorithm as Levenshtein Metric. Small values of Levenshtein distance between tokens detect such SQL statements that were built by modifications of others.
We also present first practical results of actual application of the algorithm, and discuss further developments of the method.
LanguageEnglish
Title of host publicationProceedings of 20th International Conference on Interactive Collaborative Learning
Pages321-326
Number of pages6
StatusPublished - 2017

Fingerprint

Students

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Scerbakov, N., Schukin, A., & Sabinin, O. (2017). Plagiarism detection in SQL student assignments. In Proceedings of 20th International Conference on Interactive Collaborative Learning (pp. 321-326)

Plagiarism detection in SQL student assignments. / Scerbakov, Nikolai; Schukin, Alexander; Sabinin, Oleg.

Proceedings of 20th International Conference on Interactive Collaborative Learning. 2017. p. 321-326.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Scerbakov, N, Schukin, A & Sabinin, O 2017, Plagiarism detection in SQL student assignments. in Proceedings of 20th International Conference on Interactive Collaborative Learning. pp. 321-326.
Scerbakov N, Schukin A, Sabinin O. Plagiarism detection in SQL student assignments. In Proceedings of 20th International Conference on Interactive Collaborative Learning. 2017. p. 321-326.
Scerbakov, Nikolai ; Schukin, Alexander ; Sabinin, Oleg. / Plagiarism detection in SQL student assignments. Proceedings of 20th International Conference on Interactive Collaborative Learning. 2017. pp. 321-326
@inproceedings{f2bbecb8b0e74949bbf8208da1bf9e0c,
title = "Plagiarism detection in SQL student assignments",
abstract = "In this paper we present an original method of detecting similarity between SQL fragments. The method is based on identifying so-called {"}SQL lexemes{"} - persistent elements of an SQL statement, and {"}SQL variables{"} - easily modifiable elements of SQL statements. Thus, any SQL statements can be replaced with a so-called token - sequence of SQL lexemes and SQL variables. Distance between SQL tokens can be calculated using such a well-known algorithm as Levenshtein Metric. Small values of Levenshtein distance between tokens detect such SQL statements that were built by modifications of others.We also present first practical results of actual application of the algorithm, and discuss further developments of the method.",
author = "Nikolai Scerbakov and Alexander Schukin and Oleg Sabinin",
year = "2017",
language = "English",
pages = "321--326",
booktitle = "Proceedings of 20th International Conference on Interactive Collaborative Learning",

}

TY - GEN

T1 - Plagiarism detection in SQL student assignments

AU - Scerbakov,Nikolai

AU - Schukin,Alexander

AU - Sabinin,Oleg

PY - 2017

Y1 - 2017

N2 - In this paper we present an original method of detecting similarity between SQL fragments. The method is based on identifying so-called "SQL lexemes" - persistent elements of an SQL statement, and "SQL variables" - easily modifiable elements of SQL statements. Thus, any SQL statements can be replaced with a so-called token - sequence of SQL lexemes and SQL variables. Distance between SQL tokens can be calculated using such a well-known algorithm as Levenshtein Metric. Small values of Levenshtein distance between tokens detect such SQL statements that were built by modifications of others.We also present first practical results of actual application of the algorithm, and discuss further developments of the method.

AB - In this paper we present an original method of detecting similarity between SQL fragments. The method is based on identifying so-called "SQL lexemes" - persistent elements of an SQL statement, and "SQL variables" - easily modifiable elements of SQL statements. Thus, any SQL statements can be replaced with a so-called token - sequence of SQL lexemes and SQL variables. Distance between SQL tokens can be calculated using such a well-known algorithm as Levenshtein Metric. Small values of Levenshtein distance between tokens detect such SQL statements that were built by modifications of others.We also present first practical results of actual application of the algorithm, and discuss further developments of the method.

M3 - Conference contribution

SP - 321

EP - 326

BT - Proceedings of 20th International Conference on Interactive Collaborative Learning

ER -