UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Siamese: scalable and incremental code clone search via multiple code representations

Ragkhitwetsagul, C; Krinke, J; (2019) Siamese: scalable and incremental code clone search via multiple code representations. Empirical Software Engineering , 24 pp. 2236-2284. 10.1007/s10664-019-09697-7. Green open access

[thumbnail of siamese_aam.pdf]
Preview
Text
siamese_aam.pdf - Accepted Version

Download (694kB) | Preview

Abstract

This paper presents a novel code clone search technique that is accurate, incremental, and scalable to hundreds of million lines of code. Our technique incorporates multiple code representations (i.e., a technique to transform code into various representations to capture different types of clones), query reduction (i.e., a technique to select clone search keywords based on their uniqueness), and a customised ranking function (i.e., a technique to allow a specific clone type to be ranked on top of the search results) to improve clone search performance. We implemented the technique in a clone search tool, called Siamese, and evaluated its search accuracy and scalability on three established clone data sets. Siamese offers the highest mean average precision of 95% and 99% on two clone benchmarks compared to seven state-of-the-art clone detection tools, and reported the largest number of Type-3 clones compared to three other code search engines. Siamese is scalable and can return cloned code snippets within 8 seconds for a code corpus of 365 million lines of code. Using an index of 130,719 GitHub projects, we demonstrate that Siamese’s incremental indexing capability dramatically decreases the index preparation time for large-scale data sets with multiple releases of software projects. The paper discusses the applications of Siamese to facilitate software development and research with two use cases including online code clone detection and clone search with automated license analysis.

Type: Article
Title: Siamese: scalable and incremental code clone search via multiple code representations
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s10664-019-09697-7
Publisher version: https://doi.org/10.1007/s10664-019-09697-7
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Code clone search, Code search engine
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10070010
Downloads since deposit
478Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item