Marchisio, Kelly;
Lewis, Patrick;
Chen, Yihong;
Artetxe, Mikel;
(2023)
Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training.
In:
Findings of the Association for Computational Linguistics: ACL 2023.
(pp. pp. 5474-5490).
Association for Computational Linguistics: Toronto, Canada.
Preview |
Text
Chen_2023.findings-acl.338.pdf Download (570kB) | Preview |
Abstract
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model’s parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MINIJOINT, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MINIPOST, where we start from a regular pretrained model, build a mini-model by extracting and freezing a few layers, and learn a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using up to 2.3x less compute on average.
Type: | Proceedings paper |
---|---|
Title: | Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training |
Event: | Findings of the Association for Computational Linguistics: ACL 2023 |
Dates: | Jul 2023 - Jul 2023 |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.18653/v1/2023.findings-acl.338 |
Publisher version: | https://doi.org/10.18653/v1/2023.findings-acl.338 |
Language: | English |
Additional information: | © 1963–2025 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10211440 |
Archive Staff Only
![]() |
View Item |