UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A machine learning approach to vulnerability detection combining software metrics and topic modelling: Evidence from smart contracts

Ibba, Giacomo; Neykova, Rumyana; Ortu, Marco; Tonelli, Roberto; Counsell, Steve; Destefanis, Giuseppe; (2025) A machine learning approach to vulnerability detection combining software metrics and topic modelling: Evidence from smart contracts. Machine Learning with Applications , 22 , Article 100759. 10.1016/j.mlwa.2025.100759. Green open access

[thumbnail of 1-s2.0-S2666827025001422-main (1).pdf]
Preview
Text
1-s2.0-S2666827025001422-main (1).pdf - Accepted Version

Download (2MB) | Preview

Abstract

This paper introduces a methodology for software vulnerability detection that combines structural and semantic analysis through software metrics and topic modelling. We evaluate the approach using smart contracts as a case study, focusing on their structural properties and the presence of known security vulnerabilities. We identify the most relevant metrics for vulnerability detection, evaluate multiple machine learning classifiers for both binary and multi-label classification, and improve classification performance by integrating topic modelling techniques. Our analysis shows that metrics such as cyclomatic complexity, nesting depth, and function calls are strongly associated with vulnerability presence. Using these metrics, the Random Forest classifier achieved strong performance in binary classification (AUC: 0.982, accuracy: 0.977, F1-score: 0.808) and multi-label classification (AUC: 0.951, accuracy: 0.729, F1-score: 0.839). The addition of topic modelling using Non-Negative Matrix Factorisation further improved results, increasing the F1-score to 0.881. The evaluation is conducted on Ethereum smart contracts written in Solidity.

Type: Article
Title: A machine learning approach to vulnerability detection combining software metrics and topic modelling: Evidence from smart contracts
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.mlwa.2025.100759
Publisher version: https://doi.org/10.1016/j.mlwa.2025.100759
Language: English
Additional information: © 2025 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Keywords: Vulnerability detection, Software metrics, Topic modelling, Machine learning, Source code analysis, Smart contracts
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10216020
Downloads since deposit
5Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item