Vlachidis, A;
Binding, C;
Tudhope, D;
(2013)
Information Extraction Techniques for the Purposes of Semantic Indexing of Archaeological Resources.
In: Richards, J, (ed.)
Digital Heritage 2013: Interfaces with the Past.
Centre for Digital Heritage, University of York: York, UK.
Preview |
Text
Vlachidis_Information_Extraction_Techniques.pdf - Accepted Version Download (181kB) | Preview |
Abstract
The paper describes the use of Information Extraction (IE), a Natural Language Processing (NLP) technique to assist ‘rich’ semantic indexing of diverse archaeological text resources. Such unpublished online documents are often referred to as ‘Grey Literature’. Established document indexing techniques are not sufficient to satisfy user information needs that expand beyond the limits of a simple term matching search. The focus of the research is to direct a semantic-aware 'rich' indexing of diverse natural language resources with properties capable of satisfying information retrieval from on-line publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project in the UoG Hypermedia Research Unit. The study proposes the use of knowledge resources and conceptual models to assist an Information Extraction process able to provide ‘rich’ semantic indexing of archaeological documents capable of resolving linguistic ambiguities of indexed terms. CRM CIDOC-EH, a standard core ontology in cultural heritage, and the English Heritage (EH) Thesauri for archaeological concepts are employed to drive the Information Extraction process and to support the aims of a semantic framework in which indexed terms are capable of supporting semantic-aware access to on-line resources. The paper describes the process of semantic indexing of archaeological concepts (periods and finds) in a corpus of 535 grey literature documents using a rule based Information Extraction technique facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Illustrative examples demonstrate the different stages of the process. Initial results suggest that the combination of information extraction with knowledge resources and standard core conceptual models is capable of supporting semantic aware and linguistically disambiguate term indexing.
Type: | Proceedings paper |
---|---|
Title: | Information Extraction Techniques for the Purposes of Semantic Indexing of Archaeological Resources |
Event: | Digital Heritage 2013: Interfaces with the Past, 6 July 2013, York, UK |
Location: | York, UK |
Open access status: | An open access version is available from UCL Discovery |
Publisher version: | https://www.york.ac.uk/digital-heritage/events/cdh... |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | Natural Language Processing, Ontology Based Information Extraction, Semantic Annotations, CIDOC, Conceptual Reference Model |
UCL classification: | UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL SLASH UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies |
URI: | https://discovery.ucl.ac.uk/id/eprint/10045620 |
Archive Staff Only
View Item |