UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Phylogenetic approaches for detecting fragmentation in genome and transcriptome annotations

Pilizota, Ivana; (2020) Phylogenetic approaches for detecting fragmentation in genome and transcriptome annotations. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Thesis_final_2020_w_ack.pdf]
Preview
Text
Thesis_final_2020_w_ack.pdf - Accepted Version

Download (27MB) | Preview

Abstract

The landscape of biological research and innovation has been transformed with the invention of genome sequencing methods and corresponding assembly and annotation algorithms. Yet many assemblies and annotations remain fragmented limiting applications which require more complete and reliable datasets. The goal of this thesis was to establish methods to detect fragmentation in genome and transcriptome annotation by exploiting available data from related species in a phylogenetic framework. Prior to applying core methods to detect fragmentation, it is important to establish informative sequences from related species, i.e. putative homologs. This typically requires all-against-all protein-protein sequence comparison within and across species in the dataset. To speed up this process, we developed an approach which attempts to incorporate transitive property of homology and considers putative homology on putative protein subsequences. Putative homologs can then be used as input for our phylogenetic heuristics to detect fragments of the same gene model in the genome assembly of interest. One heuristic collapses internal tree branches with low SH-like branch support, the other exploits a likelihood ratio value. The heuristics found 1,221 pairs of distinct gene models in the challenging putative bread wheat genome which we believe are actually fragments of the same gene model. We also employed the heuristics on the putative genome of wild olive and identified 102 pairs of distinct gene models, potentially fragments of the same model. Importantly, we provide guidelines on assessing predictions based on the data at hand. Finally, we started exploring behaviour of the heuristics on the transcript models constructed on the cassava transcriptome assembly. Due to time constraints, the outcomes of the study are limited but hopefully provide sound guidelines for further work. The methods are not restricted to the plant kingdom and can already be used on any species in their current state.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Phylogenetic approaches for detecting fragmentation in genome and transcriptome annotations
Event: UCL (University College London)
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2020. Original content in this thesis is licensed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
URI: https://discovery.ucl.ac.uk/id/eprint/10100153
Downloads since deposit
61Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item