Barrigana Ramos Da Costa, RJ;
(2017)
Stochastic Models and Statistical Inference In Evolutionary Genetics: Using DNA Sequence Data To Learn About Population Divergence And Speciation.
Doctoral thesis , UCL (University College London).
Preview |
Text
thesis.pdf Download (1MB) | Preview |
Abstract
During speciation, the degree of clustering of a population in terms of genetic polymorphisms increases gradually until the exchange of genes between subpopulations is no longer possible. The isolation-with-migration (IM) model is used to estimate how long ago an ancestral population divided into two subpopulations, and to infer the level of gene flow between the subpopulations during genetic divergence. Its assumption of constant gene flow until the present is however particularly unrealistic in the context of two present-day species. In addition, traditional methods to fit the IM model are aimed at large numbers of DNA sequences from a small number of loci, and are computationally very expensive. To overcome these limitations, this thesis begins by focusing on an extension of the IM model in which the initial period of gene flow is followed by a period of isolation: the so-called isolation-with-initial-migration (IIM) model. For an IIM model with potentially asymmetric gene flow and unequal subpopulation sizes, the distribution of the number of nucleotide differences between two homologous DNA sequences is derived. Based on this distribution, we develop a maximum-likelihood estimation method which is appropriate for data sets containing observations from many independent loci, and is both very efficient and able to deal with mutation rate heterogeneity. Using a data set of Drosophila sequences from approximately 30,000 loci, we show how alternative models, representing different evolutionary scenarios, can be distinguished by means of likelihood ratio tests. To enable inference on both historical and contemporary rates of gene flow between two closely related species, our estimation method is extended to a generalised IM (GIM) model, in which gene flow rates and population sizes can change at some point in the past. Finally, we show how the theory of statistical inference under model misspecification can be used to improve the accuracy of interval estimation and comparison of speciation models; and we develop a simulation method to estimate the limiting distribution of the likelihood ratio statistic when the true parameter vector lies on the boundary of the parameter space.
Type: | Thesis (Doctoral) |
---|---|
Title: | Stochastic Models and Statistical Inference In Evolutionary Genetics: Using DNA Sequence Data To Learn About Population Divergence And Speciation |
Event: | University College London |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Keywords: | Coalescent, speciation, gene flow, maximum-likelihood, parameters on the boundary |
UCL classification: | UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/1568254 |




Archive Staff Only
![]() |
View Item |