UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Marchisio, Kelly; Lewis, Patrick; Chen, Yihong; Artetxe, Mikel; (2023) Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training. In: Findings of the Association for Computational Linguistics: ACL 2023. (pp. pp. 5474-5490). Association for Computational Linguistics: Toronto, Canada. Green open access

[thumbnail of Chen_2023.findings-acl.338.pdf]
Preview
Text
Chen_2023.findings-acl.338.pdf

Download (570kB) | Preview

Abstract

Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model’s parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MINIJOINT, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MINIPOST, where we start from a regular pretrained model, build a mini-model by extracting and freezing a few layers, and learn a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using up to 2.3x less compute on average.

Type: Proceedings paper
Title: Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training
Event: Findings of the Association for Computational Linguistics: ACL 2023
Dates: Jul 2023 - Jul 2023
Open access status: An open access version is available from UCL Discovery
DOI: 10.18653/v1/2023.findings-acl.338
Publisher version: https://doi.org/10.18653/v1/2023.findings-acl.338
Language: English
Additional information: © 1963–2025 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10211440
Downloads since deposit
3Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item