UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Fine-Tuning via Linked Domains: A Closed-Form Dual Alignment Mechanism for Transferring Vision-Language Models

Lu, Peiyu; Li, Xiaoxu; Zhu, Rui; Ma, Zhanyu; Cao, Jie; Xue, Jing-Hao; (2025) Fine-Tuning via Linked Domains: A Closed-Form Dual Alignment Mechanism for Transferring Vision-Language Models. IEEE Transactions on Circuits and Systems for Video Technology 10.1109/tcsvt.2025.3613794. (In press). Green open access

[thumbnail of PeiyuLu-TCSVT-2025.pdf]
Preview
Text
PeiyuLu-TCSVT-2025.pdf - Accepted Version

Download (4MB) | Preview

Abstract

Adapters and prompt learning have become two de facto strategies to fine-tune pre-trained vision-language models, mitigating the high computational cost of fine-tuning an entire model for downstream tasks. They can align the prediction from the fine-tuned model with that from the pre-trained model. However, the existing methods of these strategies primarily focus on aligning within a single modality, and the exploration of bidirectional interactions between modalities remains limited. To address this issue, we propose a closed-form dual alignment mechanism (DAM) that not only ensures the consistency in predictions within a single modality but also achieves the alignment of features across different modalities. In DAM, all alignments are achieved by closed-form solutions to ridge regression, without inducing a massive number of learnable parameters. Experimental results demonstrate that DAM outperforms the state-of-the-art methods on 11 benchmarks over various evaluation metrics. Our codes are available at https://github.com/Peiy-Lu/DAM.

Type: Article
Title: Fine-Tuning via Linked Domains: A Closed-Form Dual Alignment Mechanism for Transferring Vision-Language Models
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/tcsvt.2025.3613794
Publisher version: https://doi.org/10.1109/tcsvt.2025.3613794
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Vision-language model, Fine-tuning, Feature alignment
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/10214695
Downloads since deposit
20Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item