Main Article Content
Introduction. Borrowing code among students is a problem that worries many university professors who teach disciplines related to the study of programming. Unfortunately, it is difficult for a teacher to personally follow the uniqueness of all students' work. In addition, a large part of the working time has to be spent on checking the work, instead of devoting this time, for example, to preparing even better tasks for students and getting acquainted with the latest trends in programming. To facilitate the detection of borrowings in the program code among students, plagiarism checking systems can be used, but most of them are designed to search for plagiarism in plain text, and, accordingly, are not adapted to the analysis of program code, which has its own characteristics.
The purpose of the article is to analyze some modern available plagiarism search systems, to develop requirements for one's own system and to describe one's own developed plagiarism evaluation system.
Results. The paper considers 4 programs designed to detect plagiarism: Measure Of Software Similarity (MOSS), Codequiry, Unicheck, CCFinderX. For each of the programs the identified advantages and disadvantages in comparison with other studied programs are given. The comparative analysis revealed the following characteristics of the studied programs: most of these systems have a server; most of these have their own public or private databases with samples to compare with; all systems have their own advanced algorithms based on known ones, some even have artificial intelligence; all of these systems have a graphical interface. The following shortcomings were also identified: some instances have a desktop client, i.e., require network downloads and additional settings, such as MOSS and CCFinderX; modern systems, such as Unicheck and Codequiry, provide for paid use; do not have a centralized system for displaying results, i.e. the teacher can not follow those who passed the task.
Based on the comparative evaluation of the analyzed systems, a proprietary plagiarism evaluation system was created in the program code in the form of a server, which has the following properties: the ability to register and authorize the user; check files for plagiarism and get the result. There are two types of roles for users on the server: User and Admin. To achieve the result, the following technologies were used: the main programming language - Java 11; development environment - IntelliJ IDEA; Gradle package collector; Spring Boot 2 Framework container; technologies REST API, JSON, OAuth2, JWT. The main algorithm that checks the program code for plagiarism is the Wagner-Fisher algorithm, which is based on such a concept as Levenstein's distance. The article describes Levenstein's algorithm and gives an example of its use to calculate the distance between two lines.
Conclusion. As a result of the work several known systems of plagiarism detection in the program code are considered, the description of advantages and disadvantages of each system is presented. It is determined that all considered systems have a common algorithm: accept the input file for evaluation; turn it into a list of tokens; use their own algorithm to evaluate the code for plagiarism; provide the result to the user. An algorithm based on a token representation is developed. Implemented its own server for evaluating plagiarism code, taking into account the advantages and disadvantages of the analyzed systems, the structure of which is given in the article. The resulting software product can act as a server that provides an API for applications designed to check for plagiarism in the software code.
Aiken A., Schleimer S., Wikerson D. Winnowing: Local Algorithm for Document Fingerprinting. // Proceeding of ACMSIGMOD Int. Conference on Management of Data. San Diego. 2003. P. 76-85. ACM Press. New York, USA. 2003.
ANTLR 4, ANother Tool for Language Recognition [Електронний ресурс] – Режим доступу: https://www.antlr.org/
Baxter I., Yahin A., Moura L., Anna M.S., Bier L. Clone Detection Using Abstract Syntax Trees. // Proceedings of ICSM. IEEE. 1998.
CCFinderX. [Електронний ресурс] – https://github.com/gpoo/ccfinderx
Codequiry. [Електронний ресурс] – Режим доступу: https://codequiry.com
Faidhi J.A.W., Robinson S.K. An Empirical Approach for Detecting Program Similarity within a University Programming Environment. //Computer and Education. 1987. 11(1). P. 11-19.
Heckel P. A. Techique for Isolationg Differences Between File. // Communications of the ACM 21(4). April 1978. P. 264-268.
Heinzte N. Scalable Document Fingerprinting. // In 1996 USENIX Workshop of Electronic Commerce, 1996.
Huang X., Hardison R.C., Miller W. A Space-efficient Algorithm for Local Similarities. // Computer Applications in the Biosciences 6. 1990. P 373-381.
Gradle. [Електронний ресурс] – Режим доступу: https://gradle.org/
IntelliJ IDEA Ultimate Edition. [Електронний ресурс] – Режим доступу: https://www.jetbrains.com/ru-ru/idea/
ISO/IEC 2382-1:1993 Information Technology – Vocabulary – Part1: Fundamental terms. [Електронний ресурс] – Режим доступу: https://www.iso.org/ru/standard/7229.html
Java 11 JDK. Oracle. [Електронний ресурс] – Режим доступу: https://www.oracle.com/java/technologies/downloads/
JSON. [Електронний ресурс] – Режим доступу: https://www.json.org/json-en.html
JWT. [Електронний ресурс] – Режим доступу: https://jwt.io/
Lexical Analysis. [Електронний ресурс] – Режим доступу: https://en.wikipedia.org/wiki/Lexical_analysis
Mishne G., M. de Rijke. Source Code Retrieval using Conceptual Similarity // Proceedings RIAO. Vaucluse. 2004. P. 539-555.
MOSS A System for Detecting Software Similarity. [Електронний ресурс] – Режим доступу: https://theory.stanford.edu/~aiken /moss/
Prechelt L., Malpohl G., Philippsen M. JPlag: Finding Plagiarism Among a Set of Programs. // Technical Report No. 1/00, University of Karlsruhe, Department of Informatics. March 2000.
OAuth2. RFC 6749 [Електронний ресурс] – Режим доступу: https://datatracker.ietf.org/doc/html/rfc6749
RESTful API Tutorial. [Електронний ресурс] – Режим доступу: https://restfulapi.net/
Spring Boot 2 Framework. [Електронний ресурс] – Режим доступу: https://spring.io/
Unicheck. [Електронний ресурс] – Режим доступу: https://unicheck.com/ua/blog/innovative-tool-for-checking-source-code-for-plagiarism
Wise M.J. String similarity via greedy string tiling and running Karb-Rabin matching. // Dept. of CS, University of Sidney. December 1993.
Відстань Левенштейна. [Електронний ресурс] – Режим доступу: http://uk.wikipedia.org/wiki/Відстань_Левенштейна
Лексема. [Електронний ресурс] – 2021. – Режим доступу: http://uk.wikipedia.org/wiki/Лексема
Плагіат. [Електронний ресурс] – 2021. – Режим доступу: http://uk.wikipedia.org/wiki/Плагіат