UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Sample size for multivariable prognostic models

Jinks, RC; (2012) Sample size for multivariable prognostic models. Doctoral thesis , UCL (University College London). Green open access

PhD_master_doc to be submitted.pdf
Available under License : See the attached licence file.

Download (4MB)


Prognosis is one of the central principles of medical practice; useful prognostic models are vital if clinicians wish to predict patient outcomes with any success. However, prognostic studies are often performed retrospectively, which can result in poorly validated models that do not become valuable clinical tools. One obstacle to planning prospective studies is the lack of sample size calculations for developing or validating multivariable models. The often used 5 or 10 events per variable (EPV) rule (Peduzzi and Concato, 1995) can result in small sample sizes which may lead to overfitting and optimism. This thesis investigates the issue of sample size in prognostic modelling, and develops calculations and recommendations which may improve prognostic study design. In order to develop multivariable prediction models, their prognostic value must be measurable and comparable. This thesis focuses on time-to-event data analysed with the Cox proportional hazards model, for which there are many proposed measures of prognostic ability. A measure of discrimination, the D statistic (Royston and Sauerbrei, 2004), is chosen for use in this work, as it has an appealing interpretation and direct relationship with a measure of explained variation. Real datasets are used to investigate how estimates of D vary with number of events. Seeking a better alternative to EPV rules, two sample size calculations are developed and tested for use where a target value of D is estimated: one based on significance testing and one on confidence interval width. The calculations are illustrated using real datasets; in general the sample sizes required are quite large. Finally, the usability of the new calculations is considered. To use the sample size calculations, researchers must estimate a target value of D, but this can be difficult if no previous study is available. To aid this, published D values from prognostic studies are collated into a ‘library’, which could be used to obtain plausible values of D to use in the calculations. To expand the library further an empirical conversion is developed to transform values of the more widely-used C-index (Harrell et al., 1984) to D.

Type: Thesis (Doctoral)
Title: Sample size for multivariable prognostic models
Open access status: An open access version is available from UCL Discovery
Language: English
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/1354112
Downloads since deposit
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item