UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Normalisation of imprecise temporal expressions extracted from text

Tissot, H; Del Fabro, MD; Derczynski, L; Roberts, A; (2019) Normalisation of imprecise temporal expressions extracted from text. Knowledge and Information Systems 10.1007/s10115-019-01338-1. (In press). Green open access

[thumbnail of ITN-Paper-Tissot_et_al-2019-KAIS.pdf]
Preview
Text
ITN-Paper-Tissot_et_al-2019-KAIS.pdf - Published Version

Download (2MB) | Preview

Abstract

Information extraction systems and techniques have been largely used to deal with the increasing amount of unstructured data available nowadays. Time is among the different kinds of information that may be extracted from such unstructured data sources, including text documents. However, the inability to correctly identify and extract temporal information from text makes it difficult to understand how the extracted events are organised in a chronological order. Furthermore, in many situations, the meaning of temporal expressions (timexes) is imprecise, such as in “less than 2 years” and “several weeks”, and cannot be accurately normalised, leading to interpretation errors. Although there are some approaches that enable representing imprecise timexes, they are not designed to be applied to specific scenarios and difficult to generalise. This paper presents a novel methodology to analyse and normalise imprecise temporal expressions by representing temporal imprecision in the form of membership functions, based on human interpretation of time in two different languages (Portuguese and English). Each resulting model is a generalisation of probability distributions in the form of trapezoidal and hexagonal fuzzy membership functions. We use an adapted F1-score to guide the choice of the best models for each kind of imprecise timex and a weighted F1-score (F1 3 D ) as a complementary metric in order to identify relevant differences when comparing two normalisation models. We apply the proposed methodology for three distinct classes of imprecise timexes, and the resulting models give distinct insights in the way each kind of temporal expression is interpreted.

Type: Article
Title: Normalisation of imprecise temporal expressions extracted from text
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s10115-019-01338-1
Publisher version: https://doi.org/10.1007/s10115-019-01338-1
Language: English
Additional information: Copyright © The Author(s)2019. Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creative commons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Keywords: Natural language processing (NLP), Information extraction, Temporal expression (timex), Imprecise timex normalisation
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
URI: https://discovery.ucl.ac.uk/id/eprint/10069317
Downloads since deposit
116Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item