UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model

Ji, Jiayi; Kapli, Paschalia; Flouri, Tomas; Yang, Ziheng; (2025) The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model. Molecular Biology and Evolution , 42 (8) , Article msaf184. 10.1093/molbev/msaf184. Green open access

[thumbnail of The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.pdf]
Preview
Text
The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.pdf - Published Version

Download (3MB) | Preview

Abstract

The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow. The base-calling error rate is extremely influential. At the low rate of e = 0.001 (Phred score of 30), estimation of species trees and population parameters are little affected by genotyping errors even at the low depth of ∼3×. At high error rates (e = 0.005 or 0.01) and low depths (less than 10×), genotyping errors can reduce the power of species tree estimation, and introduce biases in estimates of population sizes, species divergence times, and the rate of gene flow. Treating heterozygotes in the sequences as missing data (ambiguities) may reduce the impact of genotyping errors. Our simulation suggests that it is preferable in terms of inference precision and accuracy to sequence a few samples at high depths rather than many samples at low depths.

Type: Article
Title: The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model
Location: United States
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/molbev/msaf184
Publisher version: https://doi.org/10.1093/molbev/msaf184
Language: English
Additional information: © The Author(s) 2025. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: Science & Technology, Life Sciences & Biomedicine, Biochemistry & Molecular Biology, Evolutionary Biology, Genetics & Heredity, Bpp, introgression, migration, multispecies coalescent, read depth, species tree, ANCESTRAL POPULATION SIZES, SPECIES TREES, GENE TREES, INFERENCE
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment
URI: https://discovery.ucl.ac.uk/id/eprint/10217618
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item