UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Comparative genome analysis to reveal protein evolution.

Grant, A.D.; (2006) Comparative genome analysis to reveal protein evolution. Doctoral thesis , University of London. Green open access

[thumbnail of U592852.pdf] Text

Download (23MB)


The completion of a substantial number of complete genome sequencing initiatives has produced more than a million protein sequences. Analysis of these protein sequences is possible using recent advances in computing and bioinformatics techniques. This thesis describes a novel automated protein classification protocol which groups proteins into families and identifies protein domain architectures via domain assignment. This data is presented in the Gene3D database which is used for subsequent analysis. The analysis of the distribution of protein family and protein domain data shows a power-law like distribution that is typically seen in many biological data distributions and is indicative of the small world networks that underlie biological systems biology. Kingdom distribution of superfamilies and protein families in Gene3D has been used to describe the evolutionary mechanisms that determine genome diversity through protein diversity. Domain occurrence profiles have been used to identify protein domain superfamilies that are correlated with genome size in bacteria. These superfamilies are shown to exhibit a balance between metabolic and regulatory roles along microeconomic principles that may determine bacterial genome size. Domain families identified in Gene3D enable a determination of the total number of protein folds in nature. Sub-clustering of domain families permits domain family sub-cluster occurrence profiles to be determined. These profiles are shown to be capable of detecting correlations and anti-correlations between domain families that are undetectable using superfamily occurrence profiles alone. Clusters of correlated domain subclusters are shown to identify functionally linked clusters of proteins. Finally, the data in Gene3D is used to functionally annotate the CATH database and provide functional predictions for un-annotated proteins, providing more comprehensive functional repertoire and greater accuracy than other functional prediction methods.

Type: Thesis (Doctoral)
Title: Comparative genome analysis to reveal protein evolution.
Identifier: PQ ETD:592852
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by Proquest
UCL classification: UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Structural and Molecular Biology
URI: https://discovery.ucl.ac.uk/id/eprint/1445528
Downloads since deposit
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item