UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem

Briand, Samuel; Dessimoz, Christophe; El-Mabrouk, Nadia; Nevers, Yannis; (2022) A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem. Systematic Biology 10.1093/sysbio/syac028. (In press). Green open access

[thumbnail of syac028.pdf]
Preview
Text
syac028.pdf - Published Version

Download (726kB) | Preview

Abstract

MOTIVATION: A large variety of pairwise measures of similarity or dissimilarity have been developed for comparing phylogenetic trees, e.g. species trees or gene trees. Due to its intuitive definition in terms of tree clades and bipartitions and its computational efficiency, the Robinson-Foulds (RF) distance is the most widely used for trees with unweighted edges and labels restricted to leaves (representing the genetic elements being compared). However, in the case of gene trees, an important information revealing the nature of the homologous relation between gene pairs (orthologs, paralogs, xenologs) is the type of event associated to each internal node of the tree, typically speciations or duplications, but other types of events may also be considered, such as horizontal gene transfers. This labeling of internal nodes is usually inferred from a gene tree/species tree reconciliation method. Here, we address the problem of comparing such event-labeled trees. The problem differs from the classical problem of comparing uniformly labeled trees (all labels belonging to the same alphabet) that may be done using the Tree Edit Distance (TED) mainly due to the fact that, in our case, two different alphabets are considered for the leaves and internal nodes of the tree, and leaves are not affected by edit operations. RESULTS: We propose an extension of the RF distance to event-labeled trees, based on edit operations comparable to those considered for TED: node insertion, node deletion and label substitution. We show that this new Labeled Robinson Foulds (LRF) distance can be computed in linear time, in addition of maintaining other desirable properties: being a metric, reducing to RF for trees with no labels on internal nodes and maintaining an intuitive interpretation. The algorithm for computing the LRF distance enables novel analyses on event-label trees such as reconciled gene trees. Here, we use it to study the impact of taxon sampling on labeled gene tree inference, and conclude that denser taxon sampling yields trees with better topology but worse labeling.

Type: Article
Title: A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/sysbio/syac028
Publisher version: https://doi.org/10.1093/sysbio/syac028
Language: English
Additional information: © The Author(s) 2022. Published by Oxford University Press on behalf of the Society of Systematic Biologists. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/).
UCL classification: UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
URI: https://discovery.ucl.ac.uk/id/eprint/10151885
Downloads since deposit
23Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item