eprintid: 10195822
rev_number: 7
eprint_status: archive
userid: 699
dir: disk0/10/19/58/22
datestamp: 2024-08-16 11:56:11
lastmod: 2024-08-16 11:56:11
status_changed: 2024-08-16 11:56:11
type: proceedings_section
metadata_visibility: show
sword_depositor: 699
creators_name: Shi, Z
creators_name: Lipani, A
title: DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
ispublished: pub
divisions: UCL
divisions: B04
divisions: F44
keywords: Natural Language Processing, Large Language Models, Parameter-efficient Fine-tuning
note: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
abstract: Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the model input, has shown promising results across various tasks and model architecture for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces extra soft prompt tokens, leading to longer input sequences, which significantly impacts training/inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DEPT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DEPT to achieve better performance while saving substantial memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DEPT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline, in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DEPT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.
date: 2024-05-11
date_type: published
publisher: International Conference on Learning Representations (ICLR)
official_url: https://openreview.net/forum?id=KjegfPGRde
oa_status: green
full_text_type: pub
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 2305017
lyricists_name: Lipani, Aldo
lyricists_id: ALIPA33
actors_name: Flynn, Bernadette
actors_id: BFFLY94
actors_role: owner
full_text_status: public
pres_type: paper
series: ICLR
publication: 12th International Conference on Learning Representations, ICLR 2024
volume: 2024
place_of_pub: Vienna, Austria
event_title: 12th International Conference on Learning Representations, ICLR 2024
book_title: 12th International Conference on Learning Representations, ICLR 2024
citation:        Shi, Z;    Lipani, A;      (2024)    DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning.                     In:  12th International Conference on Learning Representations, ICLR 2024.    International Conference on Learning Representations (ICLR): Vienna, Austria.       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10195822/1/725_DePT_Decomposed_Prompt_Tun.pdf