UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies

Yang, Ziheng; Kapli, paschalia; Telford, max; Katari, I; (2023) DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies. Systematic Biology , Article syad036. 10.1093/sysbio/syad036. (In press). Green open access

[thumbnail of Yang_syad036.pdf]
Preview
Text
Yang_syad036.pdf

Download (854kB) | Preview

Abstract

Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here, we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies.

Type: Article
Title: DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/sysbio/syad036
Publisher version: https://doi.org/10.1093/sysbio/syad036
Language: English
Additional information: © The Author(s) 2023. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please email: journals.permissions@oup.com. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
Keywords: Amino acid models, codon models, deep phylogeny, nonhomogeneous processes, nucleotide substitution, phylogenetic information
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment
URI: https://discovery.ucl.ac.uk/id/eprint/10170888
Downloads since deposit
27Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item