UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Evolutionary analysis of protein structural families to assist comparative modelling of genome sequences

Reeves, Gabrielle Anne; (2003) Evolutionary analysis of protein structural families to assist comparative modelling of genome sequences. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Evolutionary_analysis_of_prote.pdf] Text

Download (32MB)


The CATH domain database clusters closely related structures (>35% sequence identity) into families. More distant evolutionary links between these families are identified by common sequence patterns, functional and structural motifs in order to cluster further into homologous superfamilies. Relatives in these homologous superfamilies share core structural similarity. However, in some superfamilies extensive structural embellishments are observed. This thesis presents an analysis of the structural variability of the homologous superfamilies in the CATH database, focusing on the secondary structure embellishments present in many of the more variable families. It was found that secondary structure elements are inserted into a number of places in the peptide chain but are often co-located on the three-dimensional structure. Using this information, a protocol is developed to correlate the structural embellishments with the functional changes observed in three particularly variable families; the ATP-dependent carboxylase-amine/thiol ligase superfamily, the cupredoxin superfamily and the thioredoxin superfamily, A number of conclusions are drawn from this structural analysis, the embellishments often mediate the domain interfaces, illustrated in the cupredoxin and ATP-grasp superfamilies. Additionally, modifications to the active sites occur through the additions of secondary structure elements. In the ATP-grasp superfamily, a large embellishment encloses the active site in some members. Experimental techniques for solving the three dimensional structure of a protein, primarily NMR and X-ray crystallography, are often hampered by technical limitations making them time consuming, and so comparative modelling techniques are being explored to create theoretical three-dimensional structural models. The second part of this thesis considers ways of modelling genome sequences, with assignments to CATH homologous superfamilies, by comparative modelling. An automatic comparative modelling pipeline has been developed where genome sequences are aligned and modelled using publicly available software in an optimised protocol (GenMod). GenMod was tested using a large dataset of 140 relatives from CATH superfamilies. Software to assess the quality of these models was selected and tested. One of the main areas reported to need improvement in current comparative modelling techniques is parent selection and here, a novel method is explored. Sets of parent structures have been created from structural sub-groups within each homologous superfamily. Regions from each of these parents were then selected by sequence similarity to create a final structural template. Results from the analysis showed that, below 30% none of the methods perfomed well, above 55% the closest relative is the best parent and between 30 and 55% the best method uses multiple parents. This work was generously supported by the Biotechnology and Biological Sciences Research Council.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Evolutionary analysis of protein structural families to assist comparative modelling of genome sequences
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest.
Keywords: Biological sciences; Protein structural families
URI: https://discovery.ucl.ac.uk/id/eprint/10101329
Downloads since deposit
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item