UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Exploration of the variability of variable selection based on distances between bootstrap sample results

Hennig, CM; Sauerbrei, W; (2019) Exploration of the variability of variable selection based on distances between bootstrap sample results. Advances in Data Analysis and Classification 10.1007/s11634-018-00351-6. (In press). Green open access

[thumbnail of Hennig -Sauerbrei2019_Article_ExplorationOfTheVariabilityOfV.pdf]
Preview
Text
Hennig -Sauerbrei2019_Article_ExplorationOfTheVariabilityOfV.pdf - Published Version

Download (13MB) | Preview

Abstract

It is well known that variable selection in multiple regression can be unstable and that the model uncertainty can be considerable. The model uncertainty can be quantified and explored by bootstrap resampling, see Sauerbrei et al. (Biom J 57:531–555, 2015). Here approaches are introduced that use the results of bootstrap replications of the variable selection process to obtain more detailed information about the data. Analyses will be based on dissimilarities between the results of the analyses of different bootstrap samples. Dissimilarities are computed between the vector of predictions, and between the sets of selected variables. The dissimilarities are used to map the models by multidimensional scaling, to cluster them, and to construct heatplots. Clusters can point to different interpretations of the data that could arise from different selections of variables supported by different bootstrap samples. A new measure of variable selection instability is also defined. The methodology can be applied to various regression models, estimators, and variable selection methods. It will be illustrated by three real data examples, using linear regression and a Cox proportional hazards model, and model selection by AIC and BIC.

Type: Article
Title: Exploration of the variability of variable selection based on distances between bootstrap sample results
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s11634-018-00351-6
Publisher version: https://doi.org/10.1007/s11634-018-00351-6
Language: English
Additional information: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Keywords: Linear regression, Cox proportional hazards, cluster analysis, multidimensional scaling, heatmaps
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/10065457
Downloads since deposit
56Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item