Self-adapting Large Visual-Language Models to Edge Devices Across Visual Modalities

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Self-adapting Large Visual-Language Models to Edge Devices Across Visual Modalities

Cai, K; Duan, Z; Liu, G; Fleming, C; Lu, CX; (2025) Self-adapting Large Visual-Language Models to Edge Devices Across Visual Modalities. In: Leonardis, A and Ricci, E and Roth, S and Russakovsky, O and Sattler, T and Varol, G, (eds.) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). (pp. pp. 301-318). Springer Nature: Cham, Switzerland.

Text
2403.04908v3.pdf - Accepted Version
Access restricted to UCL open access staff until 1 November 2025.
Download (17MB)

Abstract

Recent advancements in Vision-Language (VL) models have sparked interest in their deployment on edge devices, yet challenges in handling diverse visual modalities, manual annotation, and computational constraints remain. We introduce EdgeVL, a novel framework that bridges this gap by seamlessly integrating dual-modality knowledge distillation and quantization-aware contrastive learning. This approach enables the adaptation of large VL models, like CLIP, for efficient use with both RGB and non-RGB images on resource-limited devices without the need for manual annotations. EdgeVL not only transfers visual language alignment capabilities to compact models but also maintains feature quality post-quantization, significantly enhancing open-vocabulary classification performance across various visual modalities. Our work represents the first systematic effort to adapt large VL models for edge deployment, showcasing up to 15.4% accuracy improvements on multiple datasets and up to 93-fold reduction in model size. Code available at https://github.com/ramdrop/edgevl.

Type:	Proceedings paper
Title:	Self-adapting Large Visual-Language Models to Edge Devices Across Visual Modalities
Event:	Computer Vision – ECCV 2024
ISBN-13:	9783031733895
DOI:	10.1007/978-3-031-73390-1_18
Publisher version:	http://dx.doi.org/10.1007/978-3-031-73390-1_18
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10200841

Downloads since deposit

1Download

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item