UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A Method for Archaeological and Dendrochronological Concept Annotation using Domain Knowledge in Information Extraction

Vlachidis, A; Tudhope, D; (2022) A Method for Archaeological and Dendrochronological Concept Annotation using Domain Knowledge in Information Extraction. International Journal of Metadata, Semantics and Ontologies , 15 (3) pp. 192-203. 10.1504/IJMSO.2021.123042. Green open access

[thumbnail of authorFinalVersion.pdf]
Preview
Text
authorFinalVersion.pdf - Accepted Version

Download (903kB) | Preview

Abstract

Advances in Natural Language Processing allow the process of deriving information from large volumes of text to be automated. Attention is turned to one of the most important, but traditionally difficult to access resources in archaeology, commonly known as “grey literature”. This paper presents the development of two separate Named-Entity Recognition (NER) pipelines aimed at the extraction of Archaeological and of Dendrochronological concepts in Dutch, respectively. The role of domain vocabulary is discussed for the development of a Knowledge Organization System (KOS)-driven, Rule-Based method of NER which makes complementary use of ontology, thesauri and domain vocabulary for information extraction and attribute assignment of semantic annotations. The NER task is challenged by a series of domain and language-oriented aspects and evaluated against a human-annotated Gold Standard. The results suggest the suitability of Rule-based KOS driven approaches for attaining the low-hanging fruits of NER, using a combination of quality vocabulary and rules.

Type: Article
Title: A Method for Archaeological and Dendrochronological Concept Annotation using Domain Knowledge in Information Extraction
Open access status: An open access version is available from UCL Discovery
DOI: 10.1504/IJMSO.2021.123042
Publisher version: https://doi.org/10.1504/IJMSO.2021.123042
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Information Extraction, Knowledge Organization Systems, Named Entity Recognition, Archaeology, Dendrochronology, Grey Literature, Semantic Annotation
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL SLASH
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies
URI: https://discovery.ucl.ac.uk/id/eprint/10140408
Downloads since deposit
25Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item