UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores

Paige, B; Bell, J; Bellet, A; Gascón, A; Ezer, D; (2021) Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores. Journal of Computational Biology , 28 (5) pp. 435-451. 10.1089/cmb.2020.0445. Green open access

[thumbnail of cmb.2020.0445.pdf]
Preview
Text
cmb.2020.0445.pdf - Published Version

Download (22MB) | Preview

Abstract

Some organizations such as 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genome-wide association studies. Even research studies that compile smaller genomic databases often utilize these databases to investigate many related traits. It is common for the study to report a genetic risk score (GRS) model for each trait within the publication. Here, we show that under some circumstances, these GRS models can be used to recover the genetic variants of individuals in these genomic databases—a reconstruction attack. In particular, if two GRS models are trained by using a largely overlapping set of participants, it is often possible to determine the genotype for each of the individuals who were used to train one GRS model, but not the other. We demonstrate this theoretically and experimentally by analyzing the Cornell Dog Genome database. The accuracy of our reconstruction attack depends on how accurately we can estimate the rate of co-occurrence of pairs of single nucleotide polymorphisms within the private database, so if this aggregate information is ever released, it would drastically reduce the security of a private genomic database. Caution should be applied when using the same database for multiple analysis, especially when a small number of individuals are included or excluded from one part of the study.

Type: Article
Title: Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores
Open access status: An open access version is available from UCL Discovery
DOI: 10.1089/cmb.2020.0445
Publisher version: https://doi.org/10.1089/cmb.2020.0445
Language: English
Additional information: This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
Keywords: genetic risk scores, genomic privacy, GWAS, long-term privacy, reconstruction attack
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10133518
Downloads since deposit
31Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item