Gammall, Jurgita;
(2025)
Pan-cancer predictive survival modelling using clinical, pathological and genetic data.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Gammal_10209789_Thesis.pdf Download (44MB) | Preview |
Abstract
Background: The growing burden of cancer and recent surge in healthcare data availability call for new ways of analysing this complex disease and improving patient outcomes. The aim of this PhD is to discover important factors and associations at pan-cancer level using linked genetic and routinely collected electronic health record data and advanced computational methods that can help inform and improve cancer prognosis. Methods: To examine existing literature, 2,824 publications were identified and a systematic review was performed on 247 selected articles. The data analyses presented in this thesis included data from 9,977 patients with bladder, breast, colorectal, endometrial, glioma, leukaemia, lung, ovarian, prostate, and renal cancers. Genetic data collected through the 100,000 Genomes Project was linked with clinical and demographic data provided by the National Cancer Registration and Analysis Service (NCRAS), Hospital Episode Statistics (HES) and Office for National Statistics (ONS). Descriptive and Kaplan Meier survival analyses were performed to visualise similarities and differences across cancer types. Cox proportional hazards regression models were applied to identify statistically significant prognostic factor associations with overall survival. Four machine learning models including Elastic Net Cox proportional hazards regression, random survival forest, gradient boosting survival and DeepSurv neural network were developed to predict cancer survival. Result: 440 gene variations (somatic mutations and germline single nucleotide variants) and 238 clinicopathological prognostic factors were identified as important in cancer survival through the systematic review. Of those, more than 500 factors were assessed in data analysis and model development. 116 unique factors were found to have significant prognostic effect for overall survival across ten cancer types when adjusted for age, sex and stage. The findings confirmed prognostic associations with overall survival identified in previous studies in factors such as multimorbidity, tumour mutational burden, and mutations in genes BRAF, CDH1, NF1, NRAS, PIK3CA, PTEN, TP53. The results also identified new prognostic associations with overall survival in factors such as referral routes, waiting times, previous hospital encounters and mutations in genes FANCE, FBXW7, GATA3, MSH6, PTPN11, RB1, RNF43. Most predictive models achieved good performance with the average C-index of 72%. Different machine learning methods achieved similar performance with DeepSurv model slightly underperforming compared to other methods. Addition of genetic data improved performance in endometrial, glioma, ovarian and prostate cancers, showing its potential importance for cancer prognosis. Conclusion: The results from this PhD project contribute to the understanding of cancer disease and could be used by researchers to further test and build the knowledge base about prognostic factors for cancer survival. The findings draw attention to certain healthcare practice-related prognostic factors such as referral route, waiting times and previous hospital encounters emphasising the importance of early cancer diagnosis and timely treatment. Predictive machine learning models could be used in clinical practice to increase the accuracy of cancer prognosis and consequently contribute to the improvement of patient outcomes.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Pan-cancer predictive survival modelling using clinical, pathological and genetic data |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics UCL |
URI: | https://discovery.ucl.ac.uk/id/eprint/10209789 |
Archive Staff Only
![]() |
View Item |