UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language‐processing techniques and knowledge‐based resources

Broughton, V; Vlachidis, A; Binding, C; Tudhope, D; May, K; (2010) Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language‐processing techniques and knowledge‐based resources. Aslib Proceedings , 62 (4-5) pp. 466-475. 10.1108/00012531011074708. Green open access

[thumbnail of Vlachidis_Excavating grey literature.pdf]
Preview
Text
Vlachidis_Excavating grey literature.pdf

Download (53kB) | Preview

Abstract

PURPOSE: This paper sets out to discuss the use of information extraction (IE), a natural language‐processing (NLP) technique to assist “rich” semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic‐aware “rich” indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project. DESIGN/METHODOLOGY/APPROACH: The paper proposes use of the English Heritage extension (CRM‐EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology‐Oriented Information Extraction process. The process of semantic indexing is based on a rule‐based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. FINDINGS: Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic‐aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms. ORIGINALITY/VALUE: The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as “Grey Literature”, from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.

Type: Article
Title: Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language‐processing techniques and knowledge‐based resources
Location: Emerald Group Publishing Limited
Open access status: An open access version is available from UCL Discovery
DOI: 10.1108/00012531011074708
Publisher version: https://doi.org/10.1108/00012531011074708
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Information management, Semantics, Data handling
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL SLASH
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies
URI: https://discovery.ucl.ac.uk/id/eprint/1556228
Downloads since deposit
78Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item