UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Domain-matched Pre-training Tasks for Dense Retrieval

Oguz, B; Lakhotia, K; Gupta, A; Lewis, P; Karpukhin, V; Piktus, A; Chen, X; ... Mehdad, Y; + view all (2022) Domain-matched Pre-training Tasks for Dense Retrieval. In: Findings of the Association for Computational Linguistics: NAACL 2022. (pp. pp. 1524-1534). Association for Computational Linguistics: Seattle, United States. Green open access

[thumbnail of 2022.findings-naacl.114.pdf]
Preview
PDF
2022.findings-naacl.114.pdf - Published Version

Download (283kB) | Preview

Abstract

Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased performance across almost all NLP tasks. A notable exception is information retrieval, where additional pre-training has so far failed to produce convincing results. We show that, with the right pre-training setup, this barrier can be overcome. We demonstrate this by pre-training large bi-encoder models on 1) a recently released set of 65 million synthetically generated questions, and 2) 200 million post-comment pairs from a preexisting dataset of Reddit conversations. We evaluate on a set of information retrieval and dialogue retrieval benchmarks, showing substantial improvements over supervised baselines.

Type: Proceedings paper
Title: Domain-matched Pre-training Tasks for Dense Retrieval
Event: NAACL 2022: Annual Conference of the North American Chapter of the Association for Computational Linguistics
ISBN-13: 9781955917766
Open access status: An open access version is available from UCL Discovery
DOI: 10.18653/v1/2022.findings-naacl.114
Publisher version: https://doi.org/10.18653/v1/2022.findings-naacl.11...
Language: English
Additional information: This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10156408
Downloads since deposit
Loading...
28Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item