Liu, XH;
Du, Y;
Wang, J;
Yu, Y;
(2025)
On the Optimization Landscape of Low Rank Adaptation Methods for Large Language Models.
In:
13th International Conference on Learning Representations ICLR 2025.
ICLR
Preview |
PDF
2672_On_the_Optimization_Lands.pdf - Accepted Version Download (454kB) | Preview |
Abstract
Training Large Language Models (LLMs) poses significant memory challenges, making low-rank adaptation methods an attractive solution. Previously, Low-Rank Adaptation (LoRA) addressed this by adding a trainable low-rank matrix to the frozen pre-trained weights in each layer, reducing the number of trainable parameters and optimizer states. GaLore, which compresses the gradient matrix instead of the weight matrix, has demonstrated superior performance to LoRA with faster convergence and reduced memory consumption. Despite their empirical success, the performance of these methods has not been fully understood or explained theoretically. In this paper, we analyze the optimization landscapes of LoRA, GaLore, and full-rank methods, revealing that GaLore benefits from fewer spurious local minima and a larger region that satisfies the PL<sup>∗</sup> condition, a variant of Polyak-Łojasiewicz (PL) condition, leading to faster convergence. Our analysis leads to a novel method, GaRare, which further improves GaLore by using gradient random projection to reduce computational overhead. Practically, GaRare achieves strong performance in both pre-training and fine-tuning tasks, offering a more efficient approach to large-scale model adaptation. Code is available at https://github.com/liuxhym/GaRare.git.
| Type: | Proceedings paper |
|---|---|
| Title: | On the Optimization Landscape of Low Rank Adaptation Methods for Large Language Models |
| Event: | ICLR 2025 |
| Open access status: | An open access version is available from UCL Discovery |
| Publisher version: | https://openreview.net/forum?id=pxclAomHat |
| Language: | English |
| Additional information: | This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions. |
| Keywords: | large language model, LoRA, optimization |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10212513 |
Archive Staff Only
![]() |
View Item |

