CodeGrid: A Grid Representation of Code

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

CodeGrid: A Grid Representation of Code

Kaboré, AK; Barr, ET; Klein, J; Bissyandé, TF; (2023) CodeGrid: A Grid Representation of Code. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. (pp. pp. 1357-1369). ACM Green open access

Preview

PDF
3597926.3598141.pdf - Published Version
Download (773kB) | Preview

Abstract

Code representation is a key step in the application of AI in software engineering. Generic NLP representations are effective but do not exploit all the rich structure inherent to code. Recent work has focused on extracting abstract syntax trees (AST) and integrating their structural information into code representations.These AST-enhanced representations advanced the state of the art and accelerated new applications of AI to software engineering. ASTs, however, neglect important aspects of code structure, notably control and data flow, leaving some potentially relevant code signal unexploited. For example, purely image-based representations perform nearly as well as AST-based representations, despite the fact that they must learn to even recognize tokens, let alone their semantics. This result, from prior work, is strong evidence that these new code representations can still be improved; it also raises the question of just what signal image-based approaches are exploiting. We answer this question. We show that code is spatial and exploit this fact to propose , a new representation that embeds tokens into a grid that preserves code layout. Unlike some of the existing state of the art, is agnostic to the downstream task: whether that task is generation or classification, can complement the learning algorithm with spatial signal. For example, we show that CNNs, which are inherently spatially-aware models, can exploit outputs to effectively tackle fundamental software engineering tasks, such as code classification, code clone detection and vulnerability detection. PixelCNN leverages 's grid representations to achieve code completion. Through extensive experiments, we validate our spatial code hypothesis, quantifying model performance as we vary the degree to which the representation preserves the grid. To demonstrate its generality, we show that augments models, improving their performance on a range of tasks, On clone detection, improves ASTNN's performance by 3.3% F1 score.

Type:	Proceedings paper
Title:	CodeGrid: A Grid Representation of Code
Event:	ISSTA '23: 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1145/3597926.3598141
Publisher version:	https://doi.org/10.1145/3597926.3598141
Language:	English
Additional information:	Thiswork is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. See: https://creativecommons.org/licenses/by-sa/4.0/
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10175716

Downloads since deposit

53Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item