Callender, Thomas;
(2023)
Personalising lung cancer screening with machine learning.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
thesis_thomas_callender.pdf - Accepted Version Download (9MB) | Preview |
Abstract
Personalised screening is based on a straightforward concept: repeated risk assessment linked to tailored management. However, delivering such programmes at scale is complex. In this work, I aimed to contribute to two areas: the simplification of risk assessment to facilitate the implementation of personalised screening for lung cancer; and, the use of synthetic data to support privacy-preserving analytics in the absence of access to patient records. I first present parsimonious machine learning models for lung cancer screening, demonstrating an approach that couples the performance of model-based risk prediction with the simplicity of risk-factor-based criteria. I trained models to predict the five-year risk of developing or dying from lung cancer using UK Biobank and US National Lung Screening Trial participants before external validation amongst temporally and geographically distinct ever-smokers in the US Prostate, Lung, Colorectal and Ovarian Screening trial. I found that three predictors – age, smoking duration, and pack-years – within an ensemble machine learning framework achieved or exceeded parity in discrimination, calibration, and net benefit with comparators. Furthermore, I show that these models are more sensitive than risk-factor-based criteria, such as those currently recommended by the US Preventive Services Taskforce. For the implementation of more personalised healthcare, researchers and developers require ready access to high-quality datasets. As such data are sensitive, their use is subject to tight control, whilst the majority of data present in electronic records are not available for research use. Synthetic data are algorithmically generated but can maintain the statistical relationships present within an original dataset. In this work, I used explicitly privacy-preserving generators to create synthetic versions of the UK Biobank before we performed exploratory data analysis and prognostic model development. Comparing results when using the synthetic against the real datasets, we show the potential for synthetic data in facilitating prognostic modelling.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Personalising lung cancer screening with machine learning |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2023. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences > Div of Medicine UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences > Div of Medicine > Respiratory Medicine |
URI: | https://discovery.ucl.ac.uk/id/eprint/10175474 |
Archive Staff Only
![]() |
View Item |