UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A comparison of strategies for selecting auxiliary variables for multiple imputation

Mainzer, RM; Nguyen, CD; Carlin, JB; Moreno-Betancur, M; White, IR; Lee, KJ; (2024) A comparison of strategies for selecting auxiliary variables for multiple imputation. Biometrical Journal , 66 (1) , Article 2200291. 10.1002/bimj.202200291. Green open access

[thumbnail of A comparison of strategies for selecting auxiliary variables for multiple imputation.pdf]
Preview
Text
A comparison of strategies for selecting auxiliary variables for multiple imputation.pdf - Published Version

Download (1MB) | Preview

Abstract

Multiple imputation (MI) is a popular method for handling missing data. Auxiliary variables can be added to the imputation model(s) to improve MI estimates. However, the choice of which auxiliary variables to include is not always straightforward. Several data-driven auxiliary variable selection strategies have been proposed, but there has been limited evaluation of their performance. Using a simulation study we evaluated the performance of eight auxiliary variable selection strategies: (1, 2) two versions of selection based on correlations in the observed data; (3) selection using hypothesis tests of the “missing completely at random” assumption; (4) replacing auxiliary variables with their principal components; (5, 6) forward and forward stepwise selection; (7) forward selection based on the estimated fraction of missing information; and (8) selection via the least absolute shrinkage and selection operator (LASSO). A complete case analysis and an MI analysis using all auxiliary variables (the “full model”) were included for comparison. We also applied all strategies to a motivating case study. The full model outperformed all auxiliary variable selection strategies in the simulation study, with the LASSO strategy the best performing auxiliary variable selection strategy overall. All MI analysis strategies that we were able to apply to the case study led to similar estimates, although computational time was substantially reduced when variable selection was employed. This study provides further support for adopting an inclusive auxiliary variable strategy where possible. Auxiliary variable selection using the LASSO may be a promising alternative when the full model fails or is too burdensome.

Type: Article
Title: A comparison of strategies for selecting auxiliary variables for multiple imputation
Location: Germany
Open access status: An open access version is available from UCL Discovery
DOI: 10.1002/bimj.202200291
Publisher version: https://doi.org/10.1002/bimj.202200291
Language: English
Additional information: © 2024 The Authors. Biometrical Journal published by Wiley-VCH GmbH. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Keywords: Imputation model, missing data, variable selection, Computer Simulation
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Inst of Clinical Trials and Methodology
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Inst of Clinical Trials and Methodology > MRC Clinical Trials Unit at UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10186517
Downloads since deposit
21Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item