Liu, Lu;
Vlachidis, Andreas;
Crymble, Adam;
Lee, Deborah;
Humbel, Marco;
(2025)
Towards Comparable Historical NER: Building a Shared Evaluation Corpus for 18th-Century Historical Texts.
In:
Anthology of Computers and the Humanities.
(pp. pp. 968-982).
Anthology of Computers and the Humanities
Preview |
Text
Towards Comparable Historical NER Building a Shared Evaluation Corpus for 18th-Century Historical Texts.pdf - Published Version Download (286kB) | Preview |
Abstract
Named Entity Recognition (NER) is increasingly applied to historical text analysis. However, differences in evaluation materials, metrics, and annotation guidelines across existing NER projects make it difficult to systematically compare different approaches to historical NER. This study addresses this issue by constructing an evaluation corpus through the normalization of four annotated datasets from the long 18th century. We evaluate the performance of the Edinburgh Geoparser, spaCy and BERT-based tool on this corpus using five evaluation modes. Results show that even under the most lenient criteria, the highest F1-score remains below 70%, highlighting the challenges of applying existing NER systems to historical texts. Through detailed error analysis, we identify common challenges such as spelling and formatting issues. These findings demonstrate the limitations of NER tools in historical documents. We argue that future work should involve collaboration with historians to ensure that evaluation corpus align with real user needs.
| Type: | Proceedings paper |
|---|---|
| Title: | Towards Comparable Historical NER: Building a Shared Evaluation Corpus for 18th-Century Historical Texts |
| Event: | Computational Humanities Research (CHR) 2025 |
| Location: | Luxembourg |
| Dates: | 9 Dec 2025 - 12 Nov 2025 |
| Open access status: | An open access version is available from UCL Discovery |
| DOI: | 10.63744/dwCJ80qwvAtr |
| Publisher version: | https://anthology.ach.org/volumes/vol0003/towards-... |
| Language: | English |
| Additional information: | https://creativecommons.org/licenses/by/4.0/?ref=chooser-v1 © 2025 by the authors. Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). |
| Keywords: | named entity recognition, evaluation corpus, historical documents, digital humanities, natural language processing |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL SLASH UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10217464 |
Archive Staff Only
![]() |
View Item |

