Fine-Tuning via Linked Domains: A Closed-Form Dual Alignment Mechanism for Transferring Vision-Language Models

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Fine-Tuning via Linked Domains: A Closed-Form Dual Alignment Mechanism for Transferring Vision-Language Models

Lu, Peiyu; Li, Xiaoxu; Zhu, Rui; Ma, Zhanyu; Cao, Jie; Xue, Jing-Hao; (2025) Fine-Tuning via Linked Domains: A Closed-Form Dual Alignment Mechanism for Transferring Vision-Language Models. IEEE Transactions on Circuits and Systems for Video Technology 10.1109/tcsvt.2025.3613794. (In press). Green open access

Preview

Text
PeiyuLu-TCSVT-2025.pdf - Accepted Version
Download (4MB) | Preview

Abstract

Adapters and prompt learning have become two de facto strategies to fine-tune pre-trained vision-language models, mitigating the high computational cost of fine-tuning an entire model for downstream tasks. They can align the prediction from the fine-tuned model with that from the pre-trained model. However, the existing methods of these strategies primarily focus on aligning within a single modality, and the exploration of bidirectional interactions between modalities remains limited. To address this issue, we propose a closed-form dual alignment mechanism (DAM) that not only ensures the consistency in predictions within a single modality but also achieves the alignment of features across different modalities. In DAM, all alignments are achieved by closed-form solutions to ridge regression, without inducing a massive number of learnable parameters. Experimental results demonstrate that DAM outperforms the state-of-the-art methods on 11 benchmarks over various evaluation metrics. Our codes are available at https://github.com/Peiy-Lu/DAM.

Type:	Article
Title:	Fine-Tuning via Linked Domains: A Closed-Form Dual Alignment Mechanism for Transferring Vision-Language Models
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1109/tcsvt.2025.3613794
Publisher version:	https://doi.org/10.1109/tcsvt.2025.3613794
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Vision-language model, Fine-tuning, Feature alignment
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10214695

Downloads since deposit

20Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item