UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

WikiCREM: A Large Unsupervised Corpus for Coreference Resolution

Kocijan, Vid; Camburu, Oana-Maria; Cretu, Ana-Maria; Yordanov, Yordan; Blunsom, Phil; Lukasiewicz, Thomas; (2019) WikiCREM: A Large Unsupervised Corpus for Coreference Resolution. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). (pp. pp. 4302-4312). Association for Computational Linguistics: Hong Kong, China. Green open access

[thumbnail of wiki.pdf]
Preview
Text
wiki.pdf - Published Version

Download (316kB) | Preview

Abstract

Pronoun resolution is a major area of natural language understanding. However, large-scale training sets are still scarce, since manually labelling data is costly. In this work, we introduce WikiCREM (Wikipedia CoREferences Masked) a large-scale, yet accurate dataset of pronoun disambiguation instances. We use a language-model-based approach for pronoun resolution in combination with our WikiCREM dataset. We compare a series of models on a collection of diverse and challenging coreference resolution problems, where we match or outperform previous state-of-the-art approaches on 6 out of 7 datasets, such as GAP, DPR, WNLI, PDP, WinoBias, and WinoGender. We release our model to be used off-the-shelf for solving pronoun disambiguation.

Type: Proceedings paper
Title: WikiCREM: A Large Unsupervised Corpus for Coreference Resolution
Event: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Dates: Nov 2019 - Nov 2019
Open access status: An open access version is available from UCL Discovery
DOI: 10.18653/v1/d19-1439
Publisher version: http://dx.doi.org/10.18653/v1/d19-1439
Language: English
Additional information: ACL materials are Copyright © 1963–2024 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10184087
Downloads since deposit
5Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item