Oguz, B;
Lakhotia, K;
Gupta, A;
Lewis, P;
Karpukhin, V;
Piktus, A;
Chen, X;
... Mehdad, Y; + view all
(2022)
Domain-matched Pre-training Tasks for Dense Retrieval.
In:
Findings of the Association for Computational Linguistics: NAACL 2022.
(pp. pp. 1524-1534).
Association for Computational Linguistics: Seattle, United States.
Preview |
PDF
2022.findings-naacl.114.pdf - Published Version Download (283kB) | Preview |
Abstract
Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased performance across almost all NLP tasks. A notable exception is information retrieval, where additional pre-training has so far failed to produce convincing results. We show that, with the right pre-training setup, this barrier can be overcome. We demonstrate this by pre-training large bi-encoder models on 1) a recently released set of 65 million synthetically generated questions, and 2) 200 million post-comment pairs from a preexisting dataset of Reddit conversations. We evaluate on a set of information retrieval and dialogue retrieval benchmarks, showing substantial improvements over supervised baselines.
Type: | Proceedings paper |
---|---|
Title: | Domain-matched Pre-training Tasks for Dense Retrieval |
Event: | NAACL 2022: Annual Conference of the North American Chapter of the Association for Computational Linguistics |
ISBN-13: | 9781955917766 |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.18653/v1/2022.findings-naacl.114 |
Publisher version: | https://doi.org/10.18653/v1/2022.findings-naacl.11... |
Language: | English |
Additional information: | This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
UCL classification: | UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science UCL > Provost and Vice Provost Offices > UCL BEAMS UCL |
URI: | https://discovery.ucl.ac.uk/id/eprint/10156408 |




Archive Staff Only
![]() |
View Item |