UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Automated PDF highlighting to support faster curation of literature for Parkinson's and Alzheimer's disease

Wu, H; Oellrich, A; Girges, C; de Bono, B; Hubbard, TJP; Dobson, RJB; (2017) Automated PDF highlighting to support faster curation of literature for Parkinson's and Alzheimer's disease. Database , 2017 (1) , Article bax027. 10.1093/database/bax027. Green open access

[thumbnail of Dobson_bax027.pdf]
Preview
Text
Dobson_bax027.pdf

Download (1MB) | Preview

Abstract

Neurodegenerative disorders such as Parkinson’s and Alzheimer’s disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F1-measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process.

Type: Article
Title: Automated PDF highlighting to support faster curation of literature for Parkinson's and Alzheimer's disease
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/database/bax027
Publisher version: http://doi.org/10.1093/database/bax027
Language: English
Additional information: Copyright The Author(s) 2017. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits distribution, and reproduction in any medium, provided the original work is properly cited.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > UCL Queen Square Institute of Neurology
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > UCL Queen Square Institute of Neurology > Clinical and Movement Neurosciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology
URI: https://discovery.ucl.ac.uk/id/eprint/1552419
Downloads since deposit
80Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item