UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Handling missing data when estimating causal effects with targeted maximum likelihood estimation

Dashti, S Ghazaleh; Lee, Katherine J; Simpson, Julie A; White, Ian R; Carlin, John B; Moreno-Betancur, Margarita; (2024) Handling missing data when estimating causal effects with targeted maximum likelihood estimation. American Journal of Epidemiology , Article kwae012. 10.1093/aje/kwae012. (In press). Green open access

[thumbnail of kwae012.pdf]
Preview
PDF
kwae012.pdf - Published Version

Download (1MB) | Preview

Abstract

Targeted maximum likelihood estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on data (1992-1998) from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate 8 missing-data methods in this context: complete-case analysis, extended TMLE incorporating an outcome-missingness model, the missing covariate missing indicator method, and 5 multiple imputation (MI) approaches using parametric or machine-learning models. We considered 6 scenarios that varied in terms of exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/nonlinear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a nonlinear term. When choosing a method for handling missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and nonlinearities is expected to perform well.

Type: Article
Title: Handling missing data when estimating causal effects with targeted maximum likelihood estimation
Location: United States
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/aje/kwae012
Publisher version: https://doi.org/10.1093/aje/kwae012
Language: English
Additional information: © The Author(s) 2024. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: causal inference, missing data, multiple imputation, targeted maximum likelihood estimation
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Inst of Clinical Trials and Methodology
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Inst of Clinical Trials and Methodology > MRC Clinical Trials Unit at UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10192195
Downloads since deposit
19Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item