UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Estimation of Cross-Species Introgression Rates using Genomic Data Despite Model Unidentifiability

Yang, Ziheng; Flouri, Tomáš; (2022) Estimation of Cross-Species Introgression Rates using Genomic Data Despite Model Unidentifiability. Molecular Biology and Evolution 10.1093/molbev/msac083. (In press). Green open access

[thumbnail of Yang_Estimation of Cross-Species Introgression Rates using Genomic Data Despite Model Unidentifiability_AAM.pdf]
Preview
Text
Yang_Estimation of Cross-Species Introgression Rates using Genomic Data Despite Model Unidentifiability_AAM.pdf - Accepted Version

Download (2MB) | Preview

Abstract

Full likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to have unidentifiability issues, whereby different models or parameters make the same predictions about the data and cannot be distinguished by the data. Previous studies have focused on heuristic methods based on gene trees and do not make an efficient use of the information in the data. Here we study the unidentifiability of MSci models under the full likelihood methods. We characterize the unidentifiability of the bidirectional introgression (BDI) model, which assumes that gene flow occurs in both directions. We derive simple rules for arbitrary BDI models, which create unidentifiability of the label-switching type. In general, an MSci model with k BDI events has 2k unidentifiable modes or towers in the posterior, with each BDI event between sister species creating within-model parameter unidentifiability and each BDI event between non-sister species creating between-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo (MCMC) samples to remove label-switching problems and implement them in the BPP program. We analyze real and synthetic data to illustrate the utility of the BDI models and the new algorithms. We discuss the unidentifiability of heuristic methods and provide guidelines for the use of MSci models to infer gene flow using genomic data.

Type: Article
Title: Estimation of Cross-Species Introgression Rates using Genomic Data Despite Model Unidentifiability
Location: United States
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/molbev/msac083
Publisher version: https://doi.org/10.1093/molbev/msac083
Language: English
Additional information: This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: BPP, MSci, Multispecies coalescent, introgression, label-switching, unidentifiability
UCL classification: UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
URI: https://discovery.ucl.ac.uk/id/eprint/10146996
Downloads since deposit
25Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item