Ji, Jiayi;
(2025)
Inference of Gene Flow from Genomic Data.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Ji_10211651_Thesis.pdf Download (29MB) | Preview |
Abstract
Species and populations did not evolve independently after splitting from their ancestors, and they were found to exchange alleles when coming into contact. The process of gene flow has been documented in numerous species throughout the tree of life. The exponential growth of genomic data over the past two decades has driven a surge in studies aiming to quantify the extent of gene flow across different systems and to understand the role of gene flow during and after speciation. Most efforts have been put on employing heuristic or approximate approaches that rely on summaries of sequence data, in which the rich information for inferring species divergence and cross-species gene flow is not fully leveraged and largely lost. Recent advances in the multispecies coalescent (MSC) model have made it a powerful framework for the inference of species tree and the estimation of two idealized formulations of gene flow: episodic introgression or continuous migration. These methods based on the MSC framework can capture more features of gene flow, including the strength, direction and timing, while also allowing for the estimation of key demographic parameters of speciation times and population sizes. This thesis focuses on gene flow inference based on the full likelihood methods implemented in Bayesian program BPP. We analyse genomic data from three different species systems. First, we apply the introgression model in Chapter 2 and both the introgression and migration models in Chapter 3 to re-analyse two previously generated datasets for chipmunk species group Tamias quadrivittatus and a Drosophila clade, identifying gene flow between both sister and non-sister species that summary methods failed to detect. Next, we compile three massive genomic datasets for chimpanzees and bonobos in Chapter 4, each of > 50,000 loci. Model-based likelihood methods identify consistent migration events, whereas earlier evidence is mostly conflicting and geographically implausible. Lastly, in Chapter 5, we evaluate the impact of read depth on the inference of gene flow using coalescent-based methods through simulation and assess the influence of phasing in analysis of data at different depths. The work in the thesis highlights the importance of using statistically adequate methods to reach reliable biological conclusions concerning cross-species gene flow. The findings in the empirical data analysis imply that introgression is pervasive and not merely an exception in species evolution.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Inference of Gene Flow from Genomic Data |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment UCL |
URI: | https://discovery.ucl.ac.uk/id/eprint/10211651 |
Archive Staff Only
![]() |
View Item |