UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

On the Compression of Language Models for Code: An Empirical Study on CodeBERT

D'Aloisio, G; Traini, L; Sarro, F; Di Marco, A; (2025) On the Compression of Language Models for Code: An Empirical Study on CodeBERT. In: Proceedings 2025 IEEE International Conference on Software Analysis Evolution and Reengineering Saner 2025. (pp. pp. 12-23). IEEE Green open access

[thumbnail of SANER_2025__Analysis_of_LLM_Compression_Methods.pdf]
Preview
PDF
SANER_2025__Analysis_of_LLM_Compression_Methods.pdf - Accepted Version

Download (954kB) | Preview

Abstract

Language models have proven successful across a wide range of software engineering tasks, but their significant computational costs often hinder their practical adoption. To address this challenge, researchers have begun applying various compression strategies to improve the efficiency of language models for code. These strategies aim to optimize inference latency and memory usage, though often at the cost of reduced model effectiveness. However, there is still a significant gap in understanding how these strategies influence the efficiency and effectiveness of language models for code. Here, we empirically investigate the impact of three well-known compression strategies - knowledge distillation, quantization, and pruning - across three different classes of software engineering tasks: vulnerability detection, code summarization, and code search. Our findings reveal that the impact of these strategies varies greatly depending on the task and the specific compression method employed. Practitioners and researchers can use these insights to make informed decisions when selecting the most appropriate compression strategy, balancing both efficiency and effectiveness based on their specific needs.

Type: Proceedings paper
Title: On the Compression of Language Models for Code: An Empirical Study on CodeBERT
Event: 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
ISBN-13: 979-8-3315-3510-0
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/SANER64311.2025.00010
Publisher version: https://doi.org/10.1109/SANER64311.2025.00010
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions.
Keywords: Language Models, Compression Strategies, Software Quality, Empirical Study
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10210002
Downloads since deposit
13Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item