Ji, Jiayi;
Kapli, Paschalia;
Flouri, Tomas;
Yang, Ziheng;
(2025)
The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.
Molecular Biology and Evolution
, 42
(8)
, Article msaf184. 10.1093/molbev/msaf184.
Preview |
Text
The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.pdf - Published Version Download (3MB) | Preview |
Abstract
The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.
| Type: | Article |
|---|---|
| Title: | The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model |
| Location: | United States |
| Open access status: | An open access version is available from UCL Discovery |
| DOI: | 10.1093/molbev/msaf184 |
| Publisher version: | https://doi.org/10.1093/molbev/msaf184 |
| Language: | English |
| Additional information: | © The Author(s) 2025. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
| Keywords: | Science & Technology, Life Sciences & Biomedicine, Biochemistry & Molecular Biology, Evolutionary Biology, Genetics & Heredity, Bpp, introgression, migration, multispecies coalescent, read depth, species tree, ANCESTRAL POPULATION SIZES, SPECIES TREES, GENE TREES, INFERENCE |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10217618 |
Archive Staff Only
![]() |
View Item |

