UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Towards Comparable Historical NER: Building a Shared Evaluation Corpus for 18th-Century Historical Texts

Liu, Lu; Vlachidis, Andreas; Crymble, Adam; Lee, Deborah; Humbel, Marco; (2025) Towards Comparable Historical NER: Building a Shared Evaluation Corpus for 18th-Century Historical Texts. In: Anthology of Computers and the Humanities. (pp. pp. 968-982). Anthology of Computers and the Humanities Green open access

[thumbnail of Towards Comparable Historical NER Building a Shared Evaluation Corpus for 18th-Century Historical Texts.pdf]
Preview
Text
Towards Comparable Historical NER Building a Shared Evaluation Corpus for 18th-Century Historical Texts.pdf - Published Version

Download (286kB) | Preview

Abstract

Named Entity Recognition (NER) is increasingly applied to historical text analysis. However, differences in evaluation materials, metrics, and annotation guidelines across existing NER projects make it difficult to systematically compare different approaches to historical NER. This study addresses this issue by constructing an evaluation corpus through the normalization of four annotated datasets from the long 18th century. We evaluate the performance of the Edinburgh Geoparser, spaCy and BERT-based tool on this corpus using five evaluation modes. Results show that even under the most lenient criteria, the highest F1-score remains below 70%, highlighting the challenges of applying existing NER systems to historical texts. Through detailed error analysis, we identify common challenges such as spelling and formatting issues. These findings demonstrate the limitations of NER tools in historical documents. We argue that future work should involve collaboration with historians to ensure that evaluation corpus align with real user needs.

Type: Proceedings paper
Title: Towards Comparable Historical NER: Building a Shared Evaluation Corpus for 18th-Century Historical Texts
Event: Computational Humanities Research (CHR) 2025
Location: Luxembourg
Dates: 9 Dec 2025 - 12 Nov 2025
Open access status: An open access version is available from UCL Discovery
DOI: 10.63744/dwCJ80qwvAtr
Publisher version: https://anthology.ach.org/volumes/vol0003/towards-...
Language: English
Additional information: https://creativecommons.org/licenses/by/4.0/?ref=chooser-v1 © 2025 by the authors. Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Keywords: named entity recognition, evaluation corpus, historical documents, digital humanities, natural language processing
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL SLASH
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies
URI: https://discovery.ucl.ac.uk/id/eprint/10217464
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item