eprintid: 10199444
rev_number: 7
eprint_status: archive
userid: 699
dir: disk0/10/19/94/44
datestamp: 2024-11-04 10:41:19
lastmod: 2024-11-04 10:41:19
status_changed: 2024-11-04 10:41:19
type: article
metadata_visibility: show
sword_depositor: 699
creators_name: Qian, Zhaozhi
creators_name: Callender, Thomas
creators_name: Cebere, Bogdan
creators_name: Janes, Sam M
creators_name: Navani, Neal
creators_name: van der Schaar, Mihaela
title: Synthetic data for privacy-preserving clinical risk prediction
ispublished: pub
divisions: UCL
divisions: B02
divisions: C10
divisions: D17
divisions: K71
keywords: Synthetic data, Machine learning, Risk-prediction, Outcomes research, Translational research
note: © The Author(s), 2024. 
This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
abstract: Synthetic data promise privacy-preserving data sharing for healthcare research and development. Compared with other privacy-enhancing approaches—such as federated learning—analyses performed on synthetic data can be applied downstream without modification, such that synthetic data can act in place of real data for a wide range of use cases. However, the role that synthetic data might play in all aspects of clinical model development remains unknown. In this work, we used state-of-the-art generators explicitly designed for privacy preservation to create a synthetic version of ever-smokers in the UK Biobank before building prognostic models for lung cancer under several data release assumptions. We demonstrate that synthetic data can be effectively used throughout the medical prognostic modeling pipeline even without eventual access to the real data. Furthermore, we show the implications of different data release approaches on how synthetic biobank data could be deployed within the healthcare system.
date: 2024-10-27
date_type: published
publisher: Springer Science and Business Media LLC
official_url: https://doi.org/10.1038/s41598-024-72894-y
oa_status: green
full_text_type: pub
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 2332204
doi: 10.1038/s41598-024-72894-y
medium: Electronic
pii: 10.1038/s41598-024-72894-y
lyricists_name: Janes, Samuel
lyricists_name: Callender, Thomas
lyricists_id: SMJAN15
lyricists_id: TCALL19
actors_name: Callender, Thomas
actors_id: TCALL19
actors_role: owner
funding_acknowledgements: EICEDAAP\100012 [Cancer Research UK]
full_text_status: public
publication: Scientific Reports
volume: 14
article_number: 25676
event_location: England
issn: 2045-2322
citation:        Qian, Zhaozhi;    Callender, Thomas;    Cebere, Bogdan;    Janes, Sam M;    Navani, Neal;    van der Schaar, Mihaela;      (2024)    Synthetic data for privacy-preserving clinical risk prediction.                   Scientific Reports , 14     , Article 25676.  10.1038/s41598-024-72894-y <https://doi.org/10.1038/s41598-024-72894-y>.       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10199444/1/s41598-024-72894-y.pdf