Wangkumhang, Pongsakorn;
(2020)
Fast and efficient statistical methods for detecting genetic admixture events and its applications in large-scale data cohorts.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Pongsakorn_E_THESIS.pdf - Submitted Version Download (15MB) | Preview |
Abstract
Present-day cohorts of genome-wide DNA provide a powerful means of elucidating admixture events where different human groups intermixed, providing new insights into human history and population movements. The method GLOBETROTTER (Hellenthal et al., 2014) shows increased precision over other available techniques for characterising admixture due to modelling haplotype information, i.e. associations among tightly linked Single Nucleotide Polymorphisms (SNPs). However, because of its computational demands, GLOBETROTTER can only handle relatively small sample sizes of tens to hundreds of admixed individuals. In this thesis, I present a new statistical method, fastGLOBETROTTER, that both reduces computational time and increases accuracy relative to GLOBETROTTER. In particular, fastGLOBETROTTER more efficiently models admixture linkage disequilibrium by sampling sets of genomic regions within individuals that are the most informative for admixture events. Additionally, I have developed an algorithm for allocating memory more efficiently to enable a factor of up to 20 fold improvement in computation time relative to GLOBETROTTER. Therefore, this technique can cope with the rapidly emerging large-scale cohorts of genetically homogeneous populations sampled from small geographic regions, e.g. within a country (China Kadoorie Biobank, UK Biobank), to provide more precise estimates of admixture dates. Via simulations, I use fastGLOBETROTTER to demonstrate the sample sizes required to characterize admixture between groups with high levels of genetic similarity, and the time depths for which these approaches can reliably detect such past intermixing. I also apply fastGLOBETROTTER to over 6000 European individuals, using over 2500 individuals as ancestry surrogates, revealing new insights into admixture across Western Europe. These include admixture events dated to ∼500-600 CE from sources carrying DNA related to present-day West Asian and North African populations found in individuals within France, Belgium and parts of Germany. I also report admixture from East-Asian/Siberian-like sources in individuals within Finland, Norway and Sweden at different times starting ∼1900 years ago.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Fast and efficient statistical methods for detecting genetic admixture events and its applications in large-scale data cohorts |
Event: | UCL |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2020. Original content in this thesis is licensed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
Keywords: | Genetic Admixture, Haplotype, Statistical Inference |
UCL classification: | UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences |
URI: | https://discovery.ucl.ac.uk/id/eprint/10094120 |
Archive Staff Only
![]() |
View Item |