UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A pilot investigation of information extraction in the semantic annotation of archaeological reports

Vlachidis, A; Tudhope, D; (2012) A pilot investigation of information extraction in the semantic annotation of archaeological reports. International Journal of Metadata, Semantics and Ontologies , 7 (3) pp. 222-235. 10.1504/IJMSO.2012.050183. Green open access

[thumbnail of Vlachidis_ijmso-vlachidis-pilot-investigation-of-IE.pdf]
Preview
Text
Vlachidis_ijmso-vlachidis-pilot-investigation-of-IE.pdf - Accepted Version

Download (679kB) | Preview

Abstract

The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances; in the case of archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organization System (KOS) is explored for the association of semantic annotation with both ontological and terminological references. The annotation process follows a rule-based information extraction approach using the GATE NLP toolkit, together with the CIDOC CRM ontology, its CRM-EH archaeological extension and English Heritage thesauri and glossaries. Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports. Further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain.

Type: Article
Title: A pilot investigation of information extraction in the semantic annotation of archaeological reports
Open access status: An open access version is available from UCL Discovery
DOI: 10.1504/IJMSO.2012.050183
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: NLP; natural language processing; KOS; knowledge organisation systems; semantic annotation; information extraction; GATE; digital archaeology; grey literature; CIDOC CRM ontology; archaeological reports; metadata.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL SLASH
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies
URI: https://discovery.ucl.ac.uk/id/eprint/1556223
Downloads since deposit
190Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item