UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Harnessing haplotype sharing information from low coverage sequencing and sparsely genotyped data

Morris, Sam; (2022) Harnessing haplotype sharing information from low coverage sequencing and sparsely genotyped data. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Morris__thesis.pdf]
Preview
Text
Morris__thesis.pdf - Other

Download (25MB) | Preview

Abstract

Accounting for linkage disequilibrium between neighbouring genetic markers has been shown to enhance power to detect fine-scale genetic population structure, particularly when considering recent shared ancestry. In particular, ChromoPainter has been shown to be a successful method at identifying shared haplotypes between samples. It has also been used widely on ancient DNA samples. However, sequencing coverage is a potentially confounding factor, and it is possible that analysing low-coverage samples may provide biased results. Whilst a small number of studies have tested the utility of using ChromoPainter on ancient DNA, none have tested a range of samples across different coverages, at all steps of the analysis pipeline. In this work, I assess the impact of coverage on each step of the ChromoPainter analysis pipeline. I show that bias can exist when exploring population structure using low-coverage samples, and investigate a series of modifications and strategies to reduce the extent of this bias. I also address a related challenge of analysing haplotype information in sparsely genotyped data in present-day individuals; for example, when analysing only variants that overlap multiple genotyping arrays. Using these findings, I infer fine-scale African ancestry in U.K. Biobank participants using a new reference panel of data from 349 African ethno-linguistic groups, demonstrating how imputation of sparsely genotyped samples can substantially harm the estimation of sub-continental ancestry. Furthermore, I analyse a novel ancient DNA dataset from Bavaria in order to determine the extent of continuity between the Late Neolithic and Iron Ages, as well as the age of east-west structure in Europe. I also analyse novel ancient DNA samples from Slavic-speaking regions, exploring the genetic relationship between samples from the Migration Era to the Early Middle Ages, and the signatures of these ancient populations in present-day Slavic speaking populations. Finally, I summarise my findings and recommend approaches for future work on haplotype-based studies using low-coverage or sparsely genotyped data.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Harnessing haplotype sharing information from low coverage sequencing and sparsely genotyped data
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2022. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10152876
Downloads since deposit
74Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item