UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Stochastic Models and Statistical Inference In Evolutionary Genetics: Using DNA Sequence Data To Learn About Population Divergence And Speciation

Barrigana Ramos Da Costa, RJ; (2017) Stochastic Models and Statistical Inference In Evolutionary Genetics: Using DNA Sequence Data To Learn About Population Divergence And Speciation. Doctoral thesis , UCL (University College London). Green open access

[thumbnail of thesis.pdf]
Preview
Text
thesis.pdf

Download (1MB) | Preview

Abstract

During speciation, the degree of clustering of a population in terms of genetic polymorphisms increases gradually until the exchange of genes between subpopulations is no longer possible. The isolation-with-migration (IM) model is used to estimate how long ago an ancestral population divided into two subpopulations, and to infer the level of gene flow between the subpopulations during genetic divergence. Its assumption of constant gene flow until the present is however particularly unrealistic in the context of two present-day species. In addition, traditional methods to fit the IM model are aimed at large numbers of DNA sequences from a small number of loci, and are computationally very expensive. To overcome these limitations, this thesis begins by focusing on an extension of the IM model in which the initial period of gene flow is followed by a period of isolation: the so-called isolation-with-initial-migration (IIM) model. For an IIM model with potentially asymmetric gene flow and unequal subpopulation sizes, the distribution of the number of nucleotide differences between two homologous DNA sequences is derived. Based on this distribution, we develop a maximum-likelihood estimation method which is appropriate for data sets containing observations from many independent loci, and is both very efficient and able to deal with mutation rate heterogeneity. Using a data set of Drosophila sequences from approximately 30,000 loci, we show how alternative models, representing different evolutionary scenarios, can be distinguished by means of likelihood ratio tests. To enable inference on both historical and contemporary rates of gene flow between two closely related species, our estimation method is extended to a generalised IM (GIM) model, in which gene flow rates and population sizes can change at some point in the past. Finally, we show how the theory of statistical inference under model misspecification can be used to improve the accuracy of interval estimation and comparison of speciation models; and we develop a simulation method to estimate the limiting distribution of the likelihood ratio statistic when the true parameter vector lies on the boundary of the parameter space.

Type: Thesis (Doctoral)
Title: Stochastic Models and Statistical Inference In Evolutionary Genetics: Using DNA Sequence Data To Learn About Population Divergence And Speciation
Event: University College London
Open access status: An open access version is available from UCL Discovery
Language: English
Keywords: Coalescent, speciation, gene flow, maximum-likelihood, parameters on the boundary
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/1568254
Downloads since deposit
125Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item