UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

How much data are required to develop and validate a risk prediction model?

Taiyari, Khadijeh; (2017) How much data are required to develop and validate a risk prediction model? Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Khadijeh Taiyari PhD Thesis.pdf]
Preview
Text
Khadijeh Taiyari PhD Thesis.pdf - Accepted Version

Download (6MB) | Preview

Abstract

It has been suggested that when developing risk prediction models using regression, the number of events in the dataset should be at least 10 times the number of parameters being estimated by the model. This rule was originally proposed to ensure the unbiased estimation of regression coefficients with confidence intervals that have correct coverage. However, only limited research has been conducted to assess the adequacy of this rule with regards to predictive performance. Furthermore, there is only limited guidance regarding the number of events required to develop risk prediction models using hierarchical data, for example when one has observations from several hospitals. One of the aims of this dissertation is to determine the number of events required to obtain reliable predictions from standard or hierarchical models for binary outcomes. This will be achieved by conducting several simulation studies based on real clinical data. It has also been suggested that when validating risk prediction models, there should be at least 100 events in the validation dataset. However, few studies have examined the adequacy of this recommendation. Furthermore, there are no guidelines regarding the sample size requirements when validating a risk prediction model based on hierarchical data. The second main aim of this dissertation is to investigate the sample size requirements for model validation using both simulation and analytical methods. In particular we will derive the relationship between sample size and the precision of some common measures of model performance such as the C statistic, D statistic, and calibration slope. The results from this dissertation will enable researchers to better assess their sample size requirements when developing and validating prediction models using both standard (independent) and clustered data.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: How much data are required to develop and validate a risk prediction model?
Event: UCL (University College London)
Open access status: An open access version is available from UCL Discovery
Language: English
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/10039739
Downloads since deposit
1,458Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item