UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Beyond the TESSERACT: Trustworthy Dataset Curation for Sound Evaluations of Android Malware Classifiers

Chow, Theo; D'Onghia, Mario; Linhardt, Lorenz; Kan, Zeliang; Arp, Daniel; Cavallaro, Lorenzo; Pierazzi, Fabio; (2026) Beyond the TESSERACT: Trustworthy Dataset Curation for Sound Evaluations of Android Malware Classifiers. In: Proceedings of the 4th IEEE Conference on Secure and Trustworthy Machine Learning. IEEE: Munich, Germany. (In press). Green open access

[thumbnail of chow-satml26.pdf]
Preview
Text
chow-satml26.pdf - Accepted Version

Download (1MB) | Preview

Abstract

The reliability of machine learning critically depends on dataset quality. While machine learning applied to computer vision and natural language processing benefits from high-quality benchmark datasets, cyber security often falls behind, as quality ties to the ability of accessing hard-to-obtain realistic data that may evolve over time. Android is, however, positioned uniquely in this ecosystem due to AndroZoo and other sources, which provide large-scale, continuously updated, and timestamped repositories of benign and malicious apps. Since their release, such data sources provided access to populations of Android apps that researchers can sample from to evaluate learning-based methods in realistic settings, i.e., over temporal frames to account for apps evolution (natural distribution shift) and test datasets that reflect in-the-wild class ratios. Surprisingly, we observe that despite this abundance of data, performance discrepancies of learning-based Android malware classifiers still persist even after satisfying such realistic requirements, which challenges our ability to understand what the state-of-the-art in this field is. In this work, we identify five novel factors that influence such discrepancies: we show how such factors have been largely overlooked and the impact they have on providing sound evaluations. Our findings and recommendations help define a methodology for creating trustworthy datasets towards sound evaluations of Android malware classifiers.

Type: Proceedings paper
Title: Beyond the TESSERACT: Trustworthy Dataset Curation for Sound Evaluations of Android Malware Classifiers
Event: IEEE SaTML 2026: 4th IEEE Conference on Secure and Trustworthy Machine Learning
Open access status: An open access version is available from UCL Discovery
Publisher version: https://satml.org/
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10220473
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item