Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Marchisio, Kelly; Lewis, Patrick; Chen, Yihong; Artetxe, Mikel; (2023) Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training. In: Findings of the Association for Computational Linguistics: ACL 2023. (pp. pp. 5474-5490). Association for Computational Linguistics: Toronto, Canada. Green open access

[thumbnail of Chen_2023.findings-acl.338.pdf]

Preview

Text
Chen_2023.findings-acl.338.pdf
Download (570kB) | Preview

Abstract

Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model’s parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MINIJOINT, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MINIPOST, where we start from a regular pretrained model, build a mini-model by extracting and freezing a few layers, and learn a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using up to 2.3x less compute on average.

Type:	Proceedings paper
Title:	Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training
Event:	Findings of the Association for Computational Linguistics: ACL 2023
Dates:	Jul 2023 - Jul 2023
Open access status:	An open access version is available from UCL Discovery
DOI:	10.18653/v1/2023.findings-acl.338
Publisher version:	https://doi.org/10.18653/v1/2023.findings-acl.338
Language:	English
Additional information:	© 1963–2025 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10211440

Downloads since deposit

3Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item