UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A protein structure based annotation of genomes

Muller, Arne; (2002) A protein structure based annotation of genomes. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of out.pdf] Text
out.pdf

Download (13MB)

Abstract

A strategy for protein structure and function based annotation of genomes was developed, evaluated and applied to the proteins of several genomes including the human genome. First the performance of the widely-used homology-based sequence comparison program PSI-BLAST to detect distant homologous relationships (≤20% sequence identity) was evaluated. The benchmark is based on two sets of sequences from the Structural Classification Of Proteins (SCOP) database for which the homologous relationships are known. About 40% of the test proteome can be annotated via remote homologies. Common sources of errors are identified. PSI-BLAST is applied to assign homologues of known structure and function to proteins of M. genitalium and M. tuberculosis. From the benchmark, the number of missed assignments and the potential extent of new structural and functional families was estimated. An automated proteome annotation system was developed to perform large scale annotations based on analyses such as PSI-BLAST. Computationally intensive analyses can be distributed across several computers. The system is based on a relational database serving as a back-end and a software interface as a front-end. Relational storage of results from different analyses permits straightforward evaluation of results and the comparison of annotations across genomes. The above annotation system was applied to fourteen proteomes including the human proteome. The extent and reliability of structural and functional annotation in these proteomes was evaluated and compared. About 40% of the human proteome can be assigned to protein folds. For 77% of the proteome there is some functional information, but only 26% of the proteome can be assigned to the standard sequence motifs that characterise function. There are substantial differences in the composition of membrane proteins between the proteomes in terms of their globular domains. Commonly occurring structural superfamilies are identified and compared across the proteomes. The frequencies of these superfamilies leads to the estimate that 98% of the human proteome evolved by domain duplication, with four of the ten most duplicated superfamilies potentially specific for multi-cellular organisms. Occurrence of domains in repeats is more common in metazoa than in single-cellular organisms. Superfamily pairs co-occurring in the same protein sequence were analysed and compared across the proteomes. Structural superfamilies over- and under-represented in human disease genes were identified.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: A protein structure based annotation of genomes
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest.
Keywords: Pure sciences; Biological sciences; Genomes
URI: https://discovery.ucl.ac.uk/id/eprint/10102807
Downloads since deposit
92Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item