eprintid: 10129703
rev_number: 14
eprint_status: archive
userid: 608
dir: disk0/10/12/97/03
datestamp: 2021-06-17 14:59:33
lastmod: 2021-10-19 22:18:01
status_changed: 2021-06-17 14:59:33
type: article
metadata_visibility: show
creators_name: Malhotra, A
creators_name: Rachet, B
creators_name: Bonaventure, A
creators_name: Pereira, SP
creators_name: Woods, LM
title: Can we screen for pancreatic cancer? Identifying a sub-population of patients at high risk of subsequent diagnosis using machine learning techniques applied to primary care data.
ispublished: pub
divisions: UCL
divisions: B02
divisions: C10
divisions: D17
divisions: G91
note: © 2021 Malhotra et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
abstract: BACKGROUND: Pancreatic cancer (PC) represents a substantial public health burden. Pancreatic cancer patients have very low survival due to the difficulty of identifying cancers early when the tumour is localised to the site of origin and treatable. Recent progress has been made in identifying biomarkers for PC in the blood and urine, but these cannot be used for population-based screening as this would be prohibitively expensive and potentially harmful. METHODS: We conducted a case-control study using prospectively-collected electronic health records from primary care individually-linked to cancer registrations. Our cases were comprised of 1,139 patients, aged 15-99 years, diagnosed with pancreatic cancer between January 1, 2005 and June 30, 2009. Each case was age-, sex- and diagnosis time-matched to four non-pancreatic (cancer patient) controls. Disease and prescription codes for the 24 months prior to diagnosis were used to identify 57 individual symptoms. Using a machine learning approach, we trained a logistic regression model on 75% of the data to predict patients who later developed PC and tested the model's performance on the remaining 25%. RESULTS: We were able to identify 41.3% of patients < = 60 years at 'high risk' of developing pancreatic cancer up to 20 months prior to diagnosis with 72.5% sensitivity, 59% specificity and, 66% AUC. 43.2% of patients >60 years were similarly identified at 17 months, with 65% sensitivity, 57% specificity and, 61% AUC. We estimate that combining our algorithm with currently available biomarker tests could result in 30 older and 400 younger patients per cancer being identified as 'potential patients', and the earlier diagnosis of around 60% of tumours. CONCLUSION: After further work this approach could be applied in the primary care setting and has the potential to be used alongside a non-invasive biomarker test to increase earlier diagnosis. This would result in a greater number of patients surviving this devastating disease.
date: 2021-06-02
date_type: published
official_url: https://doi.org/10.1371/journal.pone.0251876
oa_status: green
full_text_type: pub
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 1871146
doi: 10.1371/journal.pone.0251876
pii: PONE-D-20-38213
lyricists_name: Pereira, Stephen
lyricists_id: SPPER57
actors_name: Pereira, Stephen
actors_id: SPPER57
actors_role: owner
full_text_status: public
publication: PLoS One
volume: 16
number: 6
article_number: e0251876
event_location: United States
citation: Malhotra, A; Rachet, B; Bonaventure, A; Pereira, SP; Woods, LM; (2021) Can we screen for pancreatic cancer? Identifying a sub-population of patients at high risk of subsequent diagnosis using machine learning techniques applied to primary care data. PLoS One , 16 (6) , Article e0251876. 10.1371/journal.pone.0251876 <https://doi.org/10.1371/journal.pone.0251876>. Green open access

document_url: https://discovery.ucl.ac.uk/id/eprint/10129703/1/journal.pone.0251876.pdf