UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Information Extraction Techniques for the Purposes of Semantic Indexing of Archaeological Resources

Vlachidis, A; Binding, C; Tudhope, D; (2013) Information Extraction Techniques for the Purposes of Semantic Indexing of Archaeological Resources. In: Richards, J, (ed.) Digital Heritage 2013: Interfaces with the Past. Centre for Digital Heritage, University of York: York, UK. Green open access

[thumbnail of Vlachidis_Information_Extraction_Techniques.pdf]
Preview
Text
Vlachidis_Information_Extraction_Techniques.pdf - Accepted Version

Download (181kB) | Preview

Abstract

The paper describes the use of Information Extraction (IE), a Natural Language Processing (NLP) technique to assist ‘rich’ semantic indexing of diverse archaeological text resources. Such unpublished online documents are often referred to as ‘Grey Literature’. Established document indexing techniques are not sufficient to satisfy user information needs that expand beyond the limits of a simple term matching search. The focus of the research is to direct a semantic-aware 'rich' indexing of diverse natural language resources with properties capable of satisfying information retrieval from on-line publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project in the UoG Hypermedia Research Unit. The study proposes the use of knowledge resources and conceptual models to assist an Information Extraction process able to provide ‘rich’ semantic indexing of archaeological documents capable of resolving linguistic ambiguities of indexed terms. CRM CIDOC-EH, a standard core ontology in cultural heritage, and the English Heritage (EH) Thesauri for archaeological concepts are employed to drive the Information Extraction process and to support the aims of a semantic framework in which indexed terms are capable of supporting semantic-aware access to on-line resources. The paper describes the process of semantic indexing of archaeological concepts (periods and finds) in a corpus of 535 grey literature documents using a rule based Information Extraction technique facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Illustrative examples demonstrate the different stages of the process. Initial results suggest that the combination of information extraction with knowledge resources and standard core conceptual models is capable of supporting semantic aware and linguistically disambiguate term indexing.

Type: Proceedings paper
Title: Information Extraction Techniques for the Purposes of Semantic Indexing of Archaeological Resources
Event: Digital Heritage 2013: Interfaces with the Past, 6 July 2013, York, UK
Location: York, UK
Open access status: An open access version is available from UCL Discovery
Publisher version: https://www.york.ac.uk/digital-heritage/events/cdh...
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Natural Language Processing, Ontology Based Information Extraction, Semantic Annotations, CIDOC, Conceptual Reference Model
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL SLASH
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies
URI: https://discovery.ucl.ac.uk/id/eprint/10045620
Downloads since deposit
27Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item