UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes

Moi, D; Kilchoer, L; Aguilar, PS; Dessimoz, C; (2020) Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes. PLoS Computational Biology , 16 (7) , Article e1007553. 10.1371/journal.pcbi.1007553. (In press). Green open access

[thumbnail of file.pdf]
Preview
Text
file.pdf - Published Version

Download (10MB) | Preview

Abstract

Phylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require at least quadratic time as a function of the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available at https://github.com/DessimozLab/HogProf.

Type: Article
Title: Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes
Open access status: An open access version is available from UCL Discovery
DOI: 10.1371/journal.pcbi.1007553
Publisher version: https://doi.org/10.1371/journal.pcbi.1007553
Language: English
Additional information: © 2020 Moi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/).
Keywords: Phylogenetics, Eukaryota, Protein interaction networks, Phylogenetic analysis, Genomics, Sexual reproduction, Forests, Fungal evolution
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment
URI: https://discovery.ucl.ac.uk/id/eprint/10106389
Downloads since deposit
36Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item